CAP theorem - Availability and Partition Tolerance

Considering P in equal terms with C and A is a bit of a mistake, rather '2 out of 3' notion among C,A,P is misleading. The succinct way I would explain CAP theorem is, "In a distributed data store, at the time of network partition you have to chose either Consistency or Availability and cannot get both". Newer NoSQL systems are trying to focus on Availability while traditional ACID databases had a higher focus on Consistency.

You really cannot choose CA, network partition is not something anyone would like to have, it is just an undesirable reality of a distributed system, networks can fail. Question is what trade off do you pick for your application when that happens. This article from the man who first formulated that term seems to explain this very clearly.


Consistency means that data is the same across the cluster, so you can read or write from/to any node and get the same data.

Availability means the ability to access the cluster even if a node in the cluster goes down.

Partition tolerance means that the cluster continues to function even if there is a "partition" (communication break) between two nodes (both nodes are up, but can't communicate).

In order to get both availability and partition tolerance, you have to give up consistency. Consider if you have two nodes, X and Y, in a master-master setup. Now, there is a break between network communication between X and Y, so they can't sync updates. At this point you can either:

A) Allow the nodes to get out of sync (giving up consistency), or

B) Consider the cluster to be "down" (giving up availability)

All the combinations available are:

  • CA - data is consistent between all nodes - as long as all nodes are online - and you can read/write from any node and be sure that the data is the same, but if you ever develop a partition between nodes, the data will be out of sync (and won't re-sync once the partition is resolved).
  • CP - data is consistent between all nodes, and maintains partition tolerance (preventing data desync) by becoming unavailable when a node goes down.
  • AP - nodes remain online even if they can't communicate with each other and will resync data once the partition is resolved, but you aren't guaranteed that all nodes will have the same data (either during or after the partition)

You should note that CA systems don't practically exist (even if some systems claim to be so).


CAP Theorem

Consistency:

A read is guaranteed to return the most recent write(like ACID) for a given client. If any request comes during that time it has to wait till data sync completed across/in the node(s).


Availability:

every node (if not failed) always executes queries and should always respond to requests. It does not matter whether it returns the latest copy or not.


Partition-tolerance:

The system will continue to function when network partitions occur.


Regarding AP, Availability(always accessible) can exist with(Cassendra) or without(RDBMS) partition tolerance

pic source


Here is how I'm discussing CAP, regarding P particularly.

CA is only possible if you are OK with a monolithic, single server database (maybe with replication but all data on one "failure block" - servers are not considered to partially fail).

If your problem requires scale out, distributed, and multi-server --- network partitions can happen. You're already requiring P. Few problems I approach are amenable to single-server-always paradigms (or, as Stonebraker said, "distributed is table stakes"). If you can find a CA problem, solutions like a traditional non-scale-out RDBMS provides a lot of benefits.

For me, rare: so we move on to discussing AP vs CP.

You only choose between AP and CP operation when you have a partition. If the network & hardware is operating correctly, you get your cake and eat it too.

Let's discuss the AP / CP distinction.

AP - when there is a network partition, let the independent parts operate freely.

CP - when there is a network partition, shut down nodes or disallow reads and writes so there are deterministic failures.

I like architectures that can do both, because some problems are AP and some are CP - and some databases can do both. Among the CP and AP solutions, there are subtleties as well.

For example, in an AP dataset, you have the possibility of both inconsistent reads, and generating write conflicts - these are two different possible AP modes. Can your system be configured for AP with high read availability but disallows write conflicts? Or can your AP system accept write conflicts, with a strong and flexible resolution system? Will you need both eventually, or can you pick a system that only does one?

In a CP system, how much unavailability do you get with small partitions (single server), if any? Greater replication can increase unavailability in a CP system, how does the system handle those tradeoffs?

These are all questions to ask with CP vs AP.

A great read in this area right now is Brewer's "12 years later" post. I believe this moves forward the CAP debate with clarity, and recommend it highly.

http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed