Python:Data Analytics and Visualization
上QQ阅读APP看书,第一时间看更新

Interacting with data in MongoDB

Many applications require more robust storage systems then text files, which is why many applications use databases to store data. There are many kinds of databases, but there are two broad categories: relational databases, which support a standard declarative language called SQL, and so called NoSQL databases, which are often able to work without a predefined schema and where a data instance is more properly described as a document, rather as a row.

MongoDB is a kind of NoSQL database that stores data as documents, which are grouped together in collections. Documents are expressed as JSON objects. It is fast and scalable in storing, and also flexible in querying, data. To use MongoDB in Python, we need to import the pymongo package and open a connection to the database by passing a hostname and port. We suppose that we have a MongoDB instance, running on the default host (localhost) and port (27017):

>>> import pymongo
>>> conn = pymongo.MongoClient(host='localhost', port=27017)

If we do not put any parameters into the pymongo.MongoClient() function, it will automatically use the default host and port.

In the next step, we will interact with databases inside the MongoDB instance. We can list all databases that are available in the instance:

>>> conn.database_names()
['local']
>>> lc = conn.local
>>> lc
Database(MongoClient('localhost', 27017), 'local')

The above snippet says that our MongoDB instance only has one database, named 'local'. If the databases and collections we point to do not exist, MongoDB will create them as necessary:

>>> db = conn.db
>>> db
Database(MongoClient('localhost', 27017), 'db')

Each database contains groups of documents, called collections. We can understand them as tables in a relational database. To list all existing collections in a database, we use collection_names() function:

>>> lc.collection_names()
['startup_log', 'system.indexes']
>>> db.collection_names()
[]

Our db database does not have any collections yet. Let's create a collection, named person, and insert data from a DataFrame object to it:

>>> collection = db.person
>>> collection
Collection(Database(MongoClient('localhost', 27017), 'db'), 'person')
>>> # insert df_ex2 DataFrame into created collection
>>> import json
>>> records = json.load(df_ex2.T.to_json()).values()
>>> records
dict_values([{'2': 3, '3': 'male', '1': 39, '4': 'vl', '0': 'Vinh'}, {'2': 3, '3': 'male', '1': 26, '4': 'dn', '0': 'Nghia'}, {'2': 4, '3': 'female', '1': 28, '4': 'dn', '0': 'Hong'}, {'2': 3, '3': 'female', '1': 25, '4': 'hn', '0': 'Lan'}, {'2': 3, '3': 'male', '1': 42, '4': 'tn', '0': 'Hung'}, {'2': 1, '3':'male', '1': 7, '4': 'hcm', '0': 'Nam'}, {'2': 1, '3': 'female', '1': 11, '4': 'hcm', '0': 'Mai'}])
>>> collection.insert(records)
[ObjectId('557da218f21c761d7c176a40'),
 ObjectId('557da218f21c761d7c176a41'),
 ObjectId('557da218f21c761d7c176a42'),
 ObjectId('557da218f21c761d7c176a43'),
 ObjectId('557da218f21c761d7c176a44'),
 ObjectId('557da218f21c761d7c176a45'),
 ObjectId('557da218f21c761d7c176a46')]

The df_ex2 is transposed and converted to a JSON string before loading into a dictionary. The insert() function receives our created dictionary from df_ex2 and saves it to the collection.

If we want to list all data inside the collection, we can execute the following commands:

>>> for cur in collection.find():
>>> print(cur)
{'4': 'vl', '2': 3, '3': 'male', '1': 39, '_id': ObjectId('557da218f21c761d7c176
a40'), '0': 'Vinh'}
{'4': 'dn', '2': 3, '3': 'male', '1': 26, '_id': ObjectId('557da218f21c761d7c176
a41'), '0': 'Nghia'}
{'4': 'dn', '2': 4, '3': 'female', '1': 28, '_id': ObjectId('557da218f21c761d7c1
76a42'), '0': 'Hong'}
{'4': 'hn', '2': 3, '3': 'female', '1': 25, '_id': ObjectId('557da218f21c761d7c1
76a43'), '0': 'Lan'}
{'4': 'tn', '2': 3, '3': 'male', '1': 42, '_id': ObjectId('557da218f21c761d7c176
a44'), '0': 'Hung'}
{'4': 'hcm', '2': 1, '3': 'male', '1': 7, '_id': ObjectId('557da218f21c761d7c176
a45'), '0': 'Nam'}
{'4': 'hcm', '2': 1, '3': 'female', '1': 11, '_id': ObjectId('557da218f21c761d7c
176a46'), '0': 'Mai'}

If we want to query data from the created collection with some conditions, we can use the find() function and pass in a dictionary describing the documents we want to retrieve. The returned result is a cursor type, which supports the iterator protocol:

>>> cur = collection.find({'3' : 'male'})
>>> type(cur)
pymongo.cursor.Cursor
>>> result = pd.DataFrame(list(cur))
>>> result
 0 1 2 3 4 _id
0 Vinh 39 3 male vl 557da218f21c761d7c176a40
1 Nghia 26 3 male dn 557da218f21c761d7c176a41
2 Hung 42 3 male tn 557da218f21c761d7c176a44
3 Nam 7 1 male hcm 557da218f21c761d7c176a45

Sometimes, we want to delete data in MongdoDB. All we need to do is to pass a query to the remove() method on the collection:

>>> # before removing data
>>> pd.DataFrame(list(collection.find()))
 0 1 2 3 4 _id
0 Vinh 39 3 male vl 557da218f21c761d7c176a40
1 Nghia 26 3 male dn 557da218f21c761d7c176a41
2 Hong 28 4 female dn 557da218f21c761d7c176a42
3 Lan 25 3 female hn 557da218f21c761d7c176a43
4 Hung 42 3 male tn 557da218f21c761d7c176a44
5 Nam 7 1 male hcm 557da218f21c761d7c176a45
6 Mai 11 1 female hcm 557da218f21c761d7c176a46

>>> # after removing records which have '2' column as 1 and '3' column as 'male'
>>> collection.remove({'2': 1, '3': 'male'})
{'n': 1, 'ok': 1}
>>> cur_all = collection.find();
>>> pd.DataFrame(list(cur_all))
 0 1 2 3 4 _id
0 Vinh 39 3 male vl 557da218f21c761d7c176a40
1 Nghia 26 3 male dn 557da218f21c761d7c176a41
2 Hong 28 4 female dn 557da218f21c761d7c176a42
3 Lan 25 3 female hn 557da218f21c761d7c176a43
4 Hung 42 3 male tn 557da218f21c761d7c176a44
5 Mai 11 1 female hcm 557da218f21c761d7c176a46

We learned step by step how to insert, query and delete data in a collection. Now, we will show how to update existing data in a collection in MongoDB:

>>> doc = collection.find_one({'1' : 42})
>>> doc['4'] = 'hcm'
>>> collection.save(doc)
ObjectId('557da218f21c761d7c176a44')
>>> pd.DataFrame(list(collection.find()))
 0 1 2 3 4 _id
0 Vinh 39 3 male vl 557da218f21c761d7c176a40
1 Nghia 26 3 male dn 557da218f21c761d7c176a41
2 Hong 28 4 female dn 557da218f21c761d7c176a42
3 Lan 25 3 female hn 557da218f21c761d7c176a43
4 Hung 42 3 male hcm 557da218f21c761d7c176a44
5 Mai 11 1 female hcm 557da218f21c761d7c176a46

The following table shows methods that provide shortcuts to manipulate documents in MongoDB: