Saturday 14 April 2012

CAP Theorem Explained

What is CAP Theorem?

The CAP Theorem(principle) was initially published in 1999 by a computer scientist by the name of Eric Brewer.  It is basically making people aware of the trade offs when creating distributed systems. CAP stands for Consistency, Availability and Partition Tolerant - The principle states that you can have any 2 of the 3, but not all 3.

Available - The system will always return a result

Consistent- All data reads will return the latest write

Partition-Tolerant - The network is partitioned - EG you have 2 Data centers

Lets Explore this a little more to understand why. If the above didn't make enough sense there is a video here.

Firstly with a distributed system you have multiple systems communicating with each other over a network, most of the time these systems all have isolated databases(nothing shared architecture) and use Asynchronous messaging to communicate. There are a few reasons people create these architectures, to scale, integrating existing systems, your CIO has a thing for buying random software or 100 other reasons. 

Partition-Tolerant & Consistent

This is the choice if you need to have your data consistent. Now why can't you have availability as well? If you want to make things available you have to replicate it across multiple nodes. Now really this trade off could be considered latency, meaning if you were willing to wait, depending on volume(think ticketmaster) you can still make things available it just sucks. However, this is a valid scaling technique for a lot of use cases at Google(BigTable), Amazon etc.  Examples would be Redis, Memcache, MongoBD(eventually consistent), and HBase. All of these have different implementations but share something in common, they do not allow reads until they can guarantee that it will be consistent.  

Partition-Tolerant & Available

This is the choice most highly scaleable systems take, their data is almost consistent and most use methods like caching, messaging and eventual consistency, just to name a few, to make the user experience as smooth as possible. Examples would be RIAK, Cassandra and CouchDB, Most gossip protocols and eventually consistent systems are partition-tolerant and available. These systems will allow reads before ensuring all data is consistent. 

Consistent & Available

This is where traditional Databases live and is not really considered an acceptable design in modern architectures. You can put a lot of hardware in front of them to scale up and for 99% of software, you will never need anything more. However, depending on how it is implemented it might not be easy to switch to a distributed architecture. Here is an interesting read on how ebay did it. Examples would be MSMSQL and MYSQL

Current CAP

The CAP principle was written 15 years ago, needless to say a lot of stuff has changed and it has served its purpose. An updated theory would say you need to choose between Consistency and Availability as Partitional Tolerance is a must. If you like to read more there is a great follow up called CAP - Twelve years later

The point of CAP is/was just to make people aware of the trade offs, which I believe has worked quite well.  CAP has served its purpose and continues to be required reading for anyone who is attempting to build a distributed system.