distributed systemsconsensusRaftPaxoscomputer science

Consensus Algorithms from the Ground Up

/ 3 min read / V. Zhao

You have five servers. They need to agree on something -- the current value of an account balance, the order of transactions, who's in charge. Sounds simple until you realize that any of those servers could crash at any moment, messages between them could arrive late or not at all, and the network might partition so that two groups of servers can't talk to each other.

Visual abstraction of neural networks in AI technology, featuring data flow and algorithms.

This is the consensus problem. It's one of the foundational challenges of distributed computing, and it's harder than it looks.

sequenceDiagram
    participant P as Proposer
    participant A1 as Acceptor 1
    participant A2 as Acceptor 2
    participant A3 as Acceptor 3
    participant L as Learner

    P->>A1: Prepare(n)
    P->>A2: Prepare(n)
    P->>A3: Prepare(n)
    A1-->>P: Promise(n)
    A2-->>P: Promise(n)
    A3-->>P: Promise(n)
    P->>A1: Accept(n, value)
    P->>A2: Accept(n, value)
    P->>A3: Accept(n, value)
    A1-->>L: Accepted(n, value)
    A2-->>L: Accepted(n, value)
    A3-->>L: Accepted(n, value)

Why it matters. Every distributed database, every blockchain, every system that replicates data across multiple machines needs consensus. When you read that your bank balance is $1,000, multiple servers agreed on that number. The mechanism behind that agreement is a consensus algorithm.

The impossibility result. Before diving into solutions, know this: the FLP impossibility theorem (Fischer, Lynch, Paterson, 1985) proved that in an asynchronous system with even one possible crash failure, no deterministic consensus algorithm can guarantee termination. In plain terms, it's mathematically impossible to build a perfect consensus system under these conditions. Every practical algorithm is a set of engineering tradeoffs around this hard limit.

Paxos: the original. Leslie Lamport published Paxos in 1998, describing it through a parable about a fictional Greek parliament. The algorithm works in rounds. A proposer suggests a value. Acceptors vote. If a majority agrees, the value is chosen. It handles crashes and message loss correctly, but the paper is notoriously hard to understand, and implementing it correctly is worse. Google's Chubby lock service used Paxos internally, and the engineers who built it wrote a paper essentially titled "Paxos Made Live" detailing how painful it was.

Raft: the understandable one. Diego Ongaro and John Ousterhout designed Raft in 2014 with an explicit goal: be easier to understand than Paxos while providing equivalent guarantees. Raft breaks consensus into three sub-problems: leader election, log replication, and safety. One server becomes the leader, clients send requests to the leader, and the leader replicates entries to followers. If the leader dies, a new election happens. The decomposition into clear phases is what makes Raft teachable.

The key insight both share. Majorities. If you have five servers, any group of three constitutes a majority. Any two majorities must overlap by at least one member. This overlap guarantees that chosen values can't be forgotten even if some servers crash. It's an elegant piece of combinatorial reasoning.

Where the tradeoffs live. Consensus algorithms force you to pick between availability and consistency when the network partitions (this is the CAP theorem in practice). Raft and Paxos choose consistency: they'd rather stop making progress than give you a wrong answer. Other systems (like Dynamo-style databases) make the opposite choice, accepting temporary inconsistency for higher availability.

Understanding consensus means understanding that distributed systems are always negotiating with failure. The algorithms don't prevent things from going wrong. They define what "correct" looks like when things inevitably do.

Get Grok Guide in your inbox

New posts delivered directly. No spam.

No spam. Unsubscribe anytime.

Related Reading