This theorem is presented as a conjecture by Prof. Eric Brewer at PODC (Principles of Distributed Computing) 2000 keynote talk. The theorem states that a distributed computer system to obtain all following 3 conditions is difficult.
- All nodes should see the same data at the same time
- Every request receives response whether it is succeed or failed
- Partition Tolerance
- System continue to work despite partitioning of the system
In 2002, Seth Gilbert and Nancy Lynch were able to prove the theorem.
Proving the CAP theorem
To prove the theorem, let’s assume that theorem is false. Which means there’s a system such that Consistency, Availability and Partition-Tolerance are satisfied.
So let’s say that system has 2 nodes which are N1 and N2.
Also there’s a value v0 in both N1 and N2 nodes, which is interested by the client.
Since system can tolerate partitioning, a user can get the same service from N1 and N2, and according to our assumption the link between N1 and N2 can be broken.
When the link is broken, client updates v0 value in N1 to v1.
Now client expect to get new ‘v’ value from N2, but still returns v0, because the link between N1 & N2 is broken.
Since client still getting v0 from N2, which implies system’s consistency has broken.
Therefore our assumption is wrong and CAP theorem is proven!
So now there can be systems which satisfies either 2 of those 3.
But what about a system which satisfies just Consistency and Availability not Partition-Tolerance?
Let’s get back to our previous system.
So the system supposed to be Consistent and Available.
Again, the link between N1 & N2 breaks.
Client sends v2 to update v values in N1 node.
What should the system do?
- Accept the request and update ‘v’ value in N1 – leads to inconsistent system
- Reject the client’s requests – leads to unavailable system
So that says there’s a rare chance of having a system which satisfies Consistency and Availability.
What is available out there is systems with,
AP – relax consistency, but not inconsistent
CP – sacrificed availability, but not unavailable.
So real world systems with AP and CP can offer a degree of consistency, availability and partition-tolerance.
This was developed to give more complete description of the space of potential tradeoffs for a distributed system
If there’s Partition (P)
- System trade-off Availability (A) and Consistency (C)
- System trade-off Latency (L) and Consistency (C)
Availability and Latency are arguably the same thing:
Unavailable → extreme high latency
PA/EL Systems – Dynamo, Cassendra, Riak
PC/EC Systems – BigTable, HBase, VoltDB/H-Store
PA/EC Systems – MongoDB
PC/EL Systems – Yahoo! PNUTS