When Not to Use SQL
When thinking about creating a new application, one of the most important parts is the application's persistence layer. And in 99% of cases, the obvious choice is a relational database.
Why Use an SQL Database?
Atomicity, Consistency, Isolation, and Durability
To start our explanation of the ACID principle, let's define some terms here to minimize future confusion:
- Bank: Imagine any bank where we keep our money.
- Transfer: A banking operation where person A sends money to person B.
- Database: The database used in this bank's systems.
- Transaction: A set of one or more operations that make up a single task or logical unit of work to be executed in the database.
Think of a bank; whenever you make a transfer, it's necessary to ensure that the money was subtracted from account A and added to account B. Atomicity in an SQL database is the principle that this transaction will be completed in full or not carried out at all.
Now, in this same bank, imagine that when making a transfer, one of the accounts becomes negative (assuming the bank doesn't allow negative accounts). Consistency is the principle that won't let this happen by failing the transaction when one of these constraints is broken.
Moreover, the principle of Isolation ensures that concurrent transactions do not interfere with each other. This guarantees that even when multiple transactions occur simultaneously, the final result is equivalent to if they had been executed sequentially. Still in the bank example, if two transfers occur "at the same time", to ensure the integrity of the information, one transaction can only occur after the other is completed.
Last but not least, after a transaction is confirmed, its permanent effects must be maintained, even in case of system failures or power outages. For this, the principle of Durability gives us this guarantee.
I don't know about you, but I certainly wouldn't leave my money in a bank that didn't have a database ensuring these principles!
Structure, Maturity, and Relevance
SQL became a standard in 1986 and has since received numerous updates, ensuring it never loses its relevance over the years.
The SQL language is powerful and flexible, able to adapt to various scenarios from the simplest to the most complex where the data domain is known. (And even not so well-known, after all, in 2023 the JSON type was formally adopted in the language standards)
SQL's maturity, combined with its intuitive tabular structure and widespread adoption as a standard in data management education, in universities and courses, maintains its relevance in the market. This popularity results in a vast availability of educational resources, from basic concepts to advanced topics, facilitating learning and deepening in SQL through various channels.
NoSQL
Now that we understand some of the main reasons why someone would use SQL, what happens in cases where you don't need ACID? Is that enough not to use SQL? Let's discuss some main cases where NoSQL can be interesting to you or your company/project!
Large Volume and Unstructured or Semi-structured Data
NoSQL is ideal for storing and managing large volumes of unstructured or semi-structured data, such as server logs, social media data, or JSON documents. A great case is Twitter/X, which in 2010 introduced FlockDB, a distributed graph database to store social graphs (who follows whom, who blocks whom) and secondary indexes. In April 2010, Twitter's FlockDB cluster stored over 13 billion edges and sustained peak traffic of 20,000 writes/second and 100,000 reads/second. This would be absurd for a MySQL database.
But then, does this mean that if I want to create a new social network, I should go straight to NoSQL? Probably not! Unless you're sure your social network will have the same traffic as Twitter, this will end up being just a premature optimization in your application.
Big Data
Big Data refers to vast and complex data sets that cannot be effectively managed, processed, or analyzed using traditional data processing tools and methods. These data sets typically exhibit three main characteristics: high data volume (usually terabytes or more), high data ingestion flow, and variety, often containing semi-structured data like JSON or even unstructured data like text, images, and videos.
These characteristics can pose a great challenge to a traditional SQL database. Making the best choice a non-relational database like MongoDB and Cassandra.
Recommendation Systems and Auto-completion
Recommendation systems and auto-completion are areas where NoSQL databases shine due to their ability to handle large volumes of data and perform fast and complex queries. Applications such as streaming platforms, e-commerce, and social networks heavily rely on personalized recommendations and auto-completion features to improve user experience.
Databases like Neo4j (graph-oriented) or Elasticsearch (document-oriented) are ideal for storing and quickly querying these complex data, allowing the generation of accurate recommendations in real-time.
CACHE
The only case where I see that small and medium-sized applications can truly benefit from a non-relational database is caching. Caching is a technique used to store temporary data in fast-access memory, allowing applications to access this data more efficiently. NoSQL databases like Redis and Memcached are often used as caching solutions due to their high performance and low latency.
By storing results of frequent queries and temporary data in the cache, the need to access the main database is reduced. This decreases the load on the database and even on any heavy processing that may occur, allowing it to handle fewer requests and maintain more stable and faster performance. This not only transforms your application into something more scalable but also allows the instances involved in your application to be of a lower level initially, bringing low cost at a time that can be crucial to the life of your project.
sequenceDiagram
participant App as Application
participant Cache as Redis Cache
participant DB as Database
App->>Cache: Query Data
alt Data Found in Cache
Cache-->>App: Return Data
else Data Not Found in Cache
Cache->>DB: Query Data
DB-->>Cache: Return Data
Cache-->>App: Store and Return Data
end
Final Considerations
Should My Project Use NoSQL?
In the end, the answer will always be: it depends. For standard projects in structured companies where you need a MVP and already have efforts in infrastructure to provision relational databases in standard ways, I would probably stick with good old SQL, but easily resort to caching solutions with something like Redis as the application's maturity grows. In personal projects where all I want is a database with a free plan, it's much easier to find NoSQL options like supabase, firebase, or faunadb. As for highly scaled projects with millions of reads and writes per minute, it's necessary to think about the data structure, the speed of operations, the criticality of information, and many other rules arising from the business. So, it's impossible to bring a playbook in a small post, after all, many solutions are still being developed to this day and there will never be an end.