The advantages and disadvantages of MongoDB

MongoDB is relatively new player in the data storage circle when compare to giant like Oracle. But it has drew a lot attentions with distributed key value store, capable of MapReduce calculation and document oriented NoSQL features.

The philosophy behind MongoDB is to retain as many functionalities as possible while permitting horizontal scale and at the same time, to make developer's life easier. It sits between the powerful but inefficient relational database and high performance but simple key value store system.

Mongodb maybe cool, but before you make the decision, its crucial to know about whether is the right database for your application. That is, know about what it can do, what it can not, the advantages and disadvantages. Nothing is more painful than you build your whole application on a database model and later find they don't fit together.

This post talks about the pros and cons of MongoDB from various aspects. Let's start from the performance.

MongoDB is a public company

On October 18, 2017, MongoDB filed for IPO, it sold 8 million shares and raise $192 million. Go public means more funding to improve the product, it's a good news for users. The research and development expenditures in 2015, 2016, 2017 are 33 , 43, 53 million dollars separately.

Performance advantage

MongoDB is a database which designed for big data storage and query, aims at Social Network applications like Facebook.

MongoDB gains its performance mainly by key value based design and easy to scale out. Being one of the NoSQL databases, MongoDB use document as basic storage unit.

A document is just a simple JSON like object, called BSON in Mongodb, which is just a blob. For example, a blog post consists of title, content and comments. In relational model, the comment will be stored as an individual table and retrieved by joining post table and comment table. In document model, they are saved as one document. They are treated as a single object. When querying, you get everything from that one document, no reference to other documents, that one document can be identified by an id. Thus get a document is a key value query, not a relational query. Key value query is much faster than relational query.

If you have worked with relational database before, you may heard the name "denormalization". It's a process that aggregate data from different tables in to one table to avoid join operations.

Database developers sometimes denormalize database structure intentionally to gain performance. That exactly what MongoDB document is doing. Normalization and denormalization with MongoDB

The second performance advantage provided by MongoDB is scalability. In MongoDB, it's very easy to scale horizontally. That is, to add more machines as the data and traffic grows and keep the responsive speed and availability.

MongoDB supports auto sharding and auto failover, so you can focus on your business logic. When the data on one node exceed threshold, MongoDB automatically rearrange the data to evenly distributed node. From the perspective of your application code, a cluster feels just like a single node. It acts like a transparent , scalable layer of data storage.

To make querying MongoDB database more efficient, it support indexing which implemented by B-Tree, index can be defined on unique field or multiple fields, just like MySQL index.

The document model

We've mentioned about the performance advantage of document model. Document model also makes developing easier. Document has no concept of tables, rows, SQL, schemas,even some queries are much similar to relational database, see CRUD in Mongodb with PHP.

If you ever used Solr or Lucene, the MongoDB document model is much like the Lucene document. If you have a lot of unstructured data, the document model is the best choice.

For example, the website traffic stats, advertise clicking stats,etc. Data like this have huge size and weakly related with each other, they don't need 100% consistentency, MongdoDB is good at storing and processing data like this. Actually, MongoDB was developed in DoubleClick which tracks online advertise activities, now acquired by Google to serve its Adsense publisher network.

Schema is created on the fly, when you use it, it's created. Basically, you can add any kind of field to MongoDB collection at any time.

This data model is a double sword, it hurts people when MongoDB is used as replacement but complement of relational database. Not only complex relationships like ERP system are not suit for MongoDB, even a simple systems like a blogging system which contains minimal relationships between data and entities are not good place to use MongoDB, both for modeling the problem domain or scaling it. They are imcompatible even at mindset level. The ideal scenario will be there are zero connections between entities and data like mentioned above.

Of course you can use MongoDB as your blog data store if you insist, but it's not only unappropriate, it's not even a good illustration or example. You can achieve something doesn't mean you are doing it in a correctly way. You may are just craming your problem domain into a wrong box, from a certain point of view, it's kind of stupid.

In it's IPO prospectus MongoDB claims that it's a general purpose database, it's not true, but what does it even mean to be "general purpose"? There are many programming languages claims itself as "general purpose", but gone the days one programming languages rule them all, nowadays it's a polyglot environment, the problem domains and use cases are so diverse that no one tool is good at everything, instead each domain is developing the dedicated tool to dedicatedly solve the specific problem. The same for database. MongoDB may has it's niche but it's definitely not "general purpose". A better definition will be MongoDB is a database good at use cases which have

You have to be polyglot in database too. Just like learning a new foreign language or a new programming language, learning a new database needs a considerable investment both for a programmers or an organization. Actually get started in MongoDB or many other databases is pretty easy, but to know when and where it use it is hard.

Flexible Schema

Actually, there is no schema in MongoDB, the document can have any number of fields, the fields can be add to existing document at anytime, dynamically. No ALTER TABLE, no rebuild indexing.

The documents are just like Javascript JSON, PHP arrays, Clojure maps or Python dictionaries, its very natural to communicate with MongoDB with dynamic language like JavaScript, PHP, or Python.

MongoDB is capable of representing rich data model with BSON data format, just like strict schema database like MySQL.

This advantage makes agile development possible, because unlike relational model, the JSON data can almost seamlessly fit into the OOP model which most programming language support today. In traditional database model you need ORM framework like Hibernate to deal with the mismatch between OOP and relational model.

Many people like to use dynamic language to do a quick developing or prototyping because instead of spending time on fighting with the type system, programmer can focus on problem itself. A strict type system just too heavy in some cases. The same for MongoDB, when you do a prototyping, the final schema of the data model usually is unknown and subject to heavy changes and iterations, use a strict schema means frequent schema changes. It will cost you a lot of time to maintain the schema which usually not worth it. If RDBMS is Java, C++, then MongoDB will be Python, Ruby and PHP.

Optional strict consistent level

By default, the insertions operation in MongoDB is fire-and-forget, it's OK for low value data like logging or click statistical and you want gain more performance. MongoDB also supports "safe mode" which ensure the insertion complete successfully.

Now we should look at some trade offs. The disadvantages including: no transactions, no joins ...

Not support transaction

To achieve the performance and scalability , MongoDB ditches the transaction support. This makes MongoDB very easy to scale horizontally which utilize many cheap hardware to balance the load. Scale horizontally is generally a hard task in relational database like MySQL. But its a breeze in NoSQL database like MongoDB.

Mongodb is a best fit if you have a lot of data, but the relation between them is weak, for example a stream of independent events, actually the early application of Mongodb is to record online ads clicking statistical information. Its not fit for strong related data like your bank account information, whenever you transfer money, many related bank data needs to be modified at the same time, and they either all done, or nothing is done. RDBMS do this by transaction.

To overcome this, MongoDB supports atomic operations like atomic increment and decrement, findAndModify. Notice that in Mongodb, only modification to single document is atomic, but operations that involves multiple documents are not atomic. So the best practice is you should store as much as possible information in the single document, avoid spreading data across multiple documents, in RDBMS this process called denormalization.

Again this limits the areas that MongoDB can be used, if you need a tremendous of denormalization to reshape the data model, it's a warning sign, you may be better off stay in RDBS and find another way to gain scalability.

Not support join operation

Since the document contains everything you need, so there is no join. This is also the trade off in order to be able to easily scale horizontally. But you can always perform your join by making multiple queries.

RAM limitation

MongoDB uses memory mapped file, let the Operating System handle the caching. The size of you database is limited by virtual memory provided by Operating System and hardware.

On 32bit machine, you can only save as much as 4 GB data because the system can only address 4GB of memory and the OS will use at least 2GB of space, leaving only 2GB for MongoDB mapped files. You can't do much for serious application with 2GB space. For production environment, a 64bits system is a must.

If you are playing it on you own 32bit machine, don't save serious data into MongoDB. When the data exceed the capacity, your insertions may fails without any warnings!

Another problem may arise with memory mapped file is you may get a Unclean shutdown detected warning and data may corrupted. MongoDB:Unclean shutdown detected.

AGPL license

Unlike most open source software you saw, Mongodb is AGPL licensed. Under this license, every changes to the Mongodb core must be open sourced to the community. If you are only an user(commercial or not) or a contributor, this license will never bothers you.

This can be a concern for some internet companies like Google or Facebook, for example, Google had done some significant changes to MySQL and Linux kernel but never open sourced them, if MySQL and Linux are AGPL licensed , Goolge must open source the changes.

Conclusion

MongoDB lacks some features of relational database like transaction or join, but it gain the ability to scale horizontal easily and flexible schema that easy to manipulate with JSON like data format.

Mongodb is a powerful tool if used properly, but its not a one size fits all solution. Adopt Mongodb without carefully scrutinize the advantages and disadvantages, and hope it will magically solve all your scale issues, is over optimistic. And you should not throw away the traditional SQL database, Mongodb is not supposed to replace them, its more of a complement. Put it simple: use the right tool for the right job.