To SQL or Not
NoSQL has gained quite some popularity in the recent years. Its distant relative, good old SQL has been the industry standard for quite some decades but NoSQL provides some features which are difficult/ impossible to achieve in SQL. However, this doesn’t mean that SQL is inferior to NoSQL in anyway. In this post I’ll be mostly talking about PostgreSQL (SQL) and MongoDB (NoSQL).
|Data stored in tables as rows||Data stored in collections as BSON/ JSON documents|
|Primary and foreign key to identify unique data in rows||No concept of foreign keys. Mongo adds a unique _id field though|
|Using joins to combined data from multiple tables||Using embedded documents, document references or $lookup to combine data from multiple collections|
|ACID transactions||Atomic operations but transactions aren’t ACID|
|Normalization of data||Denormalization of data|
|Vertical scalability||Horizontal scalability|
Tables vs BSONs
SQL is more suited when you have a structured data and a pre-defined schema of your data. This data can be easily represented using tables and usually the structure remains fixed.
NoSQL employs the use of JSONs (or BSONs in MongoDB) which stores data as key-value pairs. NoSQL is more suited for modern webapps where the data changes dynamically and determining a fixed structure beforehand is difficult.
The document model of NoSQL is more productive because it maps directly to objects in modern web technologies. The wide column model of SQL provides more granular access to data than the key-value model but also less flexibility when it comes to dynamically changing data requirements.
In NoSQL, the entire document can be accessed with a single call to the database as compared to SQL, which requires joins across multiple tables from multiple locations. NoSQL also makes it easier to distribute the data across multiple nodes, called sharding. MongoDB 3.2 can also perform joins (left outer equi joins).
Normalization vs Denormalization
SQL involves normalizing the data by removing redundant information and splitting it into smaller tables. This can make the data representation simpler but it also spreads the data across multiple locations (tables) which adds a bit to the overhead in accessing them.
NoSQL involves denormalization. It maintains duplicate copies of data to improve access time (since the cost of disk storage has reduced and bottleneck is more common for access time). Data is embedded, i.e. 1:1 or 1:many documents are embedded within each other to improve the access time. Embedding documents is not always helpful and referencing between documents in multiple collections should be used if:
- A document is frequently read but the embedded document is rarely used.
- The document size exceeds the 16MB limit of MongoDB.
- If the object is referenced from many different sources.
- Representing complex many:many relationships.
- Modelling large, hierarchical data sets.
Referencing documents in NoSQL is similar to normalization in SQL but it requires follow-up queries which adds to the server requests. Referencing is usually achieved by saving the _id field of one document in the related document as a reference. Note that these are not integrity constraints like foreign keys.
NoSQL databases are usually classified as consistent or eventually consistent systems (since they maintain multiple copies of data for availability and scalability). With eventual consistency, writes are not immediately visible (see CAP theorem). This can be acceptable for non-critical systems such as social networking or news feed trading it for increased scalability and availability. However for critical applications such as banking, etc SQL databases are suited better. Otherwise in NoSQL, this has to be handled at the application level. MongoDB provides tunable consistency, defined at query level.
All writes and reads access the primary copy of the data. As an option, read queries can be issued against secondary copies where data maybe eventually consistent if the write operation has not yet been synchronized with the secondary copy; the consistency choice is made at the query level.
SQL databases provides vertical scalability, they can scale quite high by adding to the computational power, but it has its limits. On the other hand, NoSQL databases scale horizontally, by adding nodes to the distributed system. It’s easier to scale using NoSQL.