MongoDb is becoming the defacto standard supporting the concept of NoSQL databases that are schemaless, and true document repositories. Companies developing state-of-the-art applications, both enterprise level and small start-ups, are embracing mongodb for its simple approach for supporting the development of schemaless databases that provide
- easy learning curve
- faster development than with traditional relational databases
- high availability and easy horizontal scalability
- extremely fast access times
- powerful indexing and querying capabilities, similar to its relational database counterparts
- an ideal database platform for both cloud and desktop computing
- schemaless, document repositories, via collections and documents, that naturally fit application domains
- an ideal database platform for embracing agile development
A general set of MongoDB supported Use Cases, both well suited and less well suited are found here.
It should be noted that MongoDB shares the NoSQL spotlight with CouchDB. Since MongoDB and CouchDB are both document-oriented databases with schemaless JSON-style object data storage, folks are naturally asking questions. Please click here to compare MongoDB and CouchDB.
MongoDB Features
Here is a list of MongoDB features. Click on the links for details.
- Document-oriented storage via JSON-style documents with dynamic schemas
- Full Index Support
- Replication & High Availability
- Auto-Sharding
- Querying
- Fast In-Place Updates
- Map/Reduce
- GridFS
- Commercial Support
The MongoDB Philosophy
MongoDB was not created to be just another database that tries to do everything for everyone.
Instead, MongoDB was created to work with documents rather than rows, was extremely fast, massively scalable, and easy to use.
In order to accomplish this, some features were excluded, namely support for transactions. Therefore, MongoDB may not be a great fit for developing accounting applications. However, it is quite common to strategize software architectures by using a hybrid of technologies. For example, a relational database works well with accounting and transactional components. While MongoDB is a great technology for storing and retrieving complex data and document storage.
At a high-level, MongoDB provides the following (this is not an exhaustive list):
- A Rich Data Model
- Easy Scaling
- Simple Administration
- Features such as indexing, embedded document references, stored javascript, fixed-size collections, file storage, aggregation, replication, auto-sharding, and more.
- And all of the Above without Sacrificing Speed
MongoDB is a document-oriented database. It is not a relational one. The primary reason for moving away from the relational model to the document-oriented model is to make scaling out easier, but additional advantages are found as well.
The fundamental idea is replacing the concept of a row with the more flexible model, the document. By supporting embedded documents and arrays, the document-oriented approach makes it possible to represent complex hierarchical relationships with a single record. This fits naturally with how data needs are addressed by developers using modern object-oriented languages.
More coming soon!
Installing and Running MongoDB on Ubuntu
I highly recommend installing MongoDB via the default repositories available in Ubuntu. These repositories contain MongoDB, but may contain out-of-date versions. Software installation from repositories is done through aptitude. In order to obtain the latest versions of MongoDB, simply add the following line to your repository list found in /etc/apt/sources.list
deb http://downloads.mongodb.org/distros/ubuntu 10.4 10gen
where 10.4 is the version of Ubuntu I run. Make sure to supply the version of Ubuntu you run.
Next, you need to tell aptitude to retrieve the new repositories as follows:
$ sudo aptitude update
And now install MongoDB using the following command:
$ sudo aptitude install mongodb-stable
As the last step in the installation process, you will need to create the data directory yourself. Make sure and do this as a non-root user.
$ sudo mkdir -p /data/db/ $ sudo chown `id -u` /data/db
You can start the MondoDB service as follows, and again, make sure and do this as a non-root user:
$ sudo start mongodb $ sudo status mongodb
And start the MongoDB shell with the following command:
$ mongo
Once in the MongoDB shell, some basic commands include the following:
> show dbs > show collections > show users > use <db name>
Quick Reference to the MongoDB Upstart Scripts
Here is a quick reference to the commands that control the execution of the mongod server process:
$ sudo status mongodb $ sudo stop mongodb $ sudo start mongodb $ sudo restart mongodb
Important in Case of Error Connecting to the Server
If upon starting the MongoDB shell, you happen to get an error message connecting to the server similar to the following:
Error: couldn’t connect to server 127.0.0.1} (anon):1137
You may also notice that even though you issued the command to start the service, it actually did not start. The next step is to verify that you are unable to start MongoDB.
Run sudo start mongodb. It will report
mongodb start/running, process XXXX
no matter what. However, when you run sudo status mongodb again, you’ll get
mongodb stop/waiting instead of mongodb start/running
Note: This condition is largely due to an unclean shutdown, and results in the creation of a lockfile /var/lib/mongodb/mongod.lock.
The fix is a quick two-step process as follows:
- Remove the lockfile.
- Run the repair script.
This is accomplished as follows:
$ sudo rm /var/lib/mongodb/mongod.lock $ sudo -u mongodb mongod -f /etc/mongodb.conf --repair
Now when you run sudo start mongodb. It will report
mongodb start/running, process XXXX
and when you run sudo status mongodb again, you’ll get the expected
mongodb start/running, process XXXX
Understanding Data Modeling under MongoDB
A MongoDB database is non-relational and schemaless. Therefore a MongoDB database is not bound to columns and data types like relational databases are. One of the primary benefits of a flexible schemaless design is that you are not restricted when programming in a dynamically typed programming language such as Python. Keep in mind that even though MongoDB is schemaless, the data structure is not completely devoid of schema as you still define collections and indexes, as we will discuss later. Nevertheless you will not need to predefine any data structure for any of your documents you will be adding.
The fundamental components of a MongoDB database are documents and collections.
Document is an item that contains the actual data, similar to a row in a SQL database table.
Collection is a collection of documents, similar to a table in a SQL database. Unlike a SQL database, two or more completely different documents can co-exist in a single collection.
More coming soon!
Developing Applications under MongoDB using Python with Example Code
Python is a simple programming language that provides the natural ability to develop code that is perfectly readable. Here are some links to Python code examples:
In this section, we will develop simple, clear, and powerful code that works with MongoDB through the Python driver known as the PyMongo driver.
Installing the PyMongo Driver
Before we begin to write Python code to access MongoDB databases, we first need to install the PyMongo driver.
I’m assuming you have Python 2.7 or greater up and running. The steps for Python installation are simple once you obtain the source from python.org/download and extract the contents from the tar file, and are commonly listed as follows:
$ ./configure $ make $ make test $ sudo make install
Of course, you should always consult the readme file to obtain up-to-date instructions.
The steps required to install the PyMongo driver are as follows:
Step 1
Obtain the Python setuptools egg for your version of Python. For example, I obtained setuptools-0.6c11-py2.7.egg from http://pypi.python.org/pypi/setuptools#using-setuptools-and-easyinstall.
Step 2
Now execute the downloaded egg as if it were an acual shell script by typing (note your setuptools egg filename may be different depending on the version of Python you are using).
$ sudo sh setuptools-0.6c11-py2.7.egg
Step 3
$ sudo easy_install pymongo
That’s it! But let’s test to ensure that our installation was a success.
From the Python shell, type the following:
>> import pymongo
You should be greeted with an empty cursor >>. That’s success.
To take it one step further, we’ll insert data into a database and also retrieve it back. Remember in MongoDB, if we try to retrieve a database that does not exist, MongoDB creates it for us automatically, as is the case with mytestdb below.
Type the following in the Python shell:
>> from pymongo import Connection
>> c = Connection()
>> db = c.mytestdb
>> collection = db.items
>> item = { "Title" : "Test Data", "Value1" : "1", "Value2" : "2"}
>> collection.insert(item)
>> collection.find_one()
{u'_id': ObjectId('4d432adc1d41c85d8a000000'), u'Value1': u'1', u'Value2': u'2', u'Title': u'Test Data'}
An Introduction to Working with MongoDB using Python via the PyMongo Driver
Now that you have the PyMongo driver installed and running, it is time to learn the basics in developing Python code that works with MongoDB.
Instead of reinventing an already great tutorial, I provide this link to the tutorial located on the mongodb.org site.
Enjoy.
Advanced: Sharding
Since sharding in MongoDB is a key feature and large topic to discuss, I created a separate post found here. This 15 minute high speed post provides a detailed overview of autosharding in MongoDB and, specifically, how to create shards supporting autosharding in MongoDB.
Enjoy.