Can relational database scale horizontally

It is possible but takes lots of maintenance efforts, Explanation -

Vertical Scaling of data (synonymous to Normalisation in SQL databases) is referred as splitting data column wise into multiple tables in order to reduce space redundancy. Example of user table -

enter image description here

Horizontal Scaling of data (synonymous to sharding) is referred as splitting row wise into multiple tables in order to reduce time taken to fetch data. Example of user table -

enter image description here

Key point to note here is as we can see tables in SQL databases are Normalised into multiple tables of related data. In order to shard data of such table on multiple machines, you would need to shard related normalised data accordingly which in turn would increase maintenance efforts. Like in the example presented above of SQL database,

Customer table which is related as one to many relation with Order table

If you move some rows of customer data onto other machine (referred as sharding) you would also need to move its related order data onto the same machine which would be troublesome task in case of multiple related tables.

Its convenient for NOSQL databases to shard out as they follow flat table structure (data is stored in aggregated form rather than normalised form).


Google Spanner is an example of a relational database that can scale horizontally. Sharding and replication are done automatically so no need to worry about that. For more information please check out this paper.


I think the answer is, unequivocally, yes. You have to keep in mind that SQL is simply a data access language. There is absolutely no reason why it can't be extended across multiple computers and network partitions. Is it a challenging problem? Most certainly, and that's why software that does it is in its infancy.

Now, I think what you are trying to ask is "Can all features that I am familiar with and that arrive in a standard SQL-type relational database management system be developed to work with multiple servers in this manner?" While I admit I haven't studied the problem in depth, there are theorems out there that say "No, it cannot." Consistency-Availability-Partition Theorem posits that we cannot have all three qualities at the same level.

Now, for all practical purposes, "sharding" or "partitioning" or whatever you want to call it is not going away; to the contrary. This means that, given the degree to which CAP theorem holds, we are going to have to shift the way we think about databases, and how we interact with them (at least, to an extent). Many developers have already made the shift necessary to be successful on a No-SQL platform, but many more have not. Ultimately, sufficient maturity of the model and effective enough workarounds will be developed that traditional SQL databases, in the sense you refer, will be more or less practical across multiple machines. This is already starting to pan out, and I would say give it a few more years and we'll be to that point. Or we'll have collectively shifted thinking to the point where it is no longer necessary, and the world will be a better place. :)


Thanks for the question and answer. I was trying to explain this to someone like this:

In terms of the CAP theorem, you can't have all three. So when a partition (network or server failure) occurs:

  • A relational database on a single server is giving you C (consistency). So when a P (partition - server/network failure) occurs, you can't have A (availability - db goes down)

  • A nosql datastore if you want A when a P occurs, you can't have C (one or more of your replicated partitions will be out of sync, until the n/w comes back and they all sync up). So it will only be eventually consistent

EDITED #2: to provide more perspective based on the comment below by Manish. My intention is to explain by example, why you cant have all 3. As noted below in the comments, there are other dbs where you can have C when P occurs at the expense of A.