
USENIX Association 2014 USENIX Annual Technical Conference 305
In Search of an Understandable Consensus Algorithm
Diego Ongaro and John Ousterhout
Stanford University
Abstract
Raft is a consensus algorithm for managing a replicated
log. It produces a result equivalent to (multi-)Paxos, and
it is as efficient as Paxos, but its structure is different
from Paxos; this makes Raft more understandable than
Paxos and also provides a better foundation for build-
ing practical systems. In order to enhance understandabil-
ity, Raft separates the key elements of consensus, such as
leader election, log replication, and safety, and it enforces
a stronger degree of coherency to reduce the number of
states that must be considered. Results from a user study
demonstrate that Raft is easier for students to learn than
Paxos. Raft also includes a new mechanism for changing
the cluster membership, which uses overlapping majori-
ties to guarantee safety.
1 Introduction
Consensus algorithms allow a collection of machines
to work as a coherent group that can survive the fail-
ures of some of its members. Because of this, they play a
key role in building reliable la rge-scale software systems.
Paxos [13, 14] has dominated the discussion of consen-
sus algorithms over the last decade: most implementations
of consensus are based on Paxos or influenced by it, and
Paxos has become the primary vehicle used to teach stu-
dents about consensus.
Unfortunately, Paxos is quite difficult to understand, in
spite of numerous attempts to make it more approachable.
Furthermore, its architecture r equires complex changes
to support practical systems. As a result, both system
builders and students struggle with Paxos.
After struggling with Paxos ourselves, we set out to
find a new consensus algorithm that could provide a bet-
ter foundation for system building and education. Our ap-
proach was unusual in that our primary goal was under-
standa bility: could we define a consensus algorithm for
practical systems and de scribe it in a way that is signifi-
cantly easier to learn than Paxos? Furthermore, we wanted
the algorithm to facilitate the development of intuitions
that are essential for system builders. It was important not
just for the algorithm to work, but for it to be obvious why
it works.
The result of this work is a consensus algorithm called
Raft. In designing Raft we applied specific techniques to
improve understandability, including decomposition (Raft
separates leader election, log replication, and safety) and
state space redu ction (relative to Paxos, Raft reduces the
degree of nondeterminism and the ways servers can be in-
consistent with each other). A user study with 43 students
at two universities shows that Raft is significantly easier
to understand than Paxos: after learning both algorithms,
33 of these students were able to answer questions about
Raft better than questions about Paxos.
Raft is similar in ma ny ways to existing consensus al-
gorithms (most notably, Oki and Liskov’s Viewstamped
Replication [27, 20]), but it has several novel features:
• Strong leader: Raft uses a stronger form of leadership
than other consensus algorithms. For example, log en-
tries only flow from the leader to other servers. This
simplifies the management of the replicated log and
makes Raft easier to u nderstand.
• Leader election: Raft uses rando mized timers to elect
leaders. This adds only a small amount of mechanism
to the heartbeats already required for any consensus al-
gorithm, while resolving conflicts simply and rapidly.
• Membership changes: Raft’s mechanism for changing
the set of servers in the cluster uses a new joint consen-
sus approach where the majorities of two different con-
figurations overlap during transitions. This allows the
cluster to continue operating normally during configu-
ration changes.
We believe that Raft is superior to Paxos and other con-
sensus algorithms, both for educational purposes and as a
foundation for implementation. It is simpler and more un-
derstandable than other algorithms; it is described com-
pletely enough to meet the needs of a p ractical system;
it has several open-source implementations and is used
by several companies; its safety properties h ave been for-
mally specified and proven; and its efficiency is compara-
ble to o ther algorithm s.
The remainder of the paper introduces the replicated
state machine problem (Section 2), discusses the strengths
and weaknesses of Paxos (Section 3), describes our gen-
eral approach to understandability (Section 4), presents
the Raft consensus algorithm (Sections 5–7), evaluates
Raft (Section 8 ), and discusses related work (Section 9).
A few elements of the Raft algorithm have been omitted
here because of space limitations, but they are available in
an extended technical report [29]. The additional material
describes how clients interac t with the system, and how
space in the Raft log can be reclaimed.
2 Replicated state machines
Consensus algorithms typically arise in the context of
replicated state machines [33]. In this approach, state ma-
chines on a collection of servers compute identical copies
of the same state and can continue operating even if some
of the servers are down. Replicated state machines are
used to solve a variety of fault tolerance problems in dis-
tributed systems. For example, large-scale systems that
资源由 www.eimhe.com 美河学习在线收集分享
评论