raft 算法.pdf - 墨天轮文档

raft 算法.pdf

__V__

All China Database Union

271

16页

0次

2022-08-14

5墨值下载

This paper is included in the Proceedings of USENIX ATC ’14:

2014 USENIX Annual Technical Conference.

June 19–20, 2014 • Philadelphia, PA

978-1-931971-10-2

Open access to the Proceedings of

USENIX ATC ’14: 2014 USENIX Annual Technical

Conference is sponsored by USENIX.

In Search of an Understandable

Consensus Algorithm

Diego Ongaro and John Ousterhout, Stanford University

https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro

资源由 www.eimhe.com 美河学习在线收集分享

USENIX Association 2014 USENIX Annual Technical Conference 305

In Search of an Understandable Consensus Algorithm

Diego Ongaro and John Ousterhout

Stanford University

Abstract

Raft is a consensus algorithm for managing a replicated

log. It produces a result equivalent to (multi-)Paxos, and

it is as efﬁcient as Paxos, but its structure is different

from Paxos; this makes Raft more understandable than

Paxos and also provides a better foundation for build-

ing practical systems. In order to enhance understandabil-

ity, Raft separates the key elements of consensus, such as

leader election, log replication, and safety, and it enforces

a stronger degree of coherency to reduce the number of

states that must be considered. Results from a user study

demonstrate that Raft is easier for students to learn than

Paxos. Raft also includes a new mechanism for changing

the cluster membership, which uses overlapping majori-

ties to guarantee safety.

1 Introduction

Consensus algorithms allow a collection of machines

to work as a coherent group that can survive the fail-

ures of some of its members. Because of this, they play a

key role in building reliable la rge-scale software systems.

Paxos [13, 14] has dominated the discussion of consen-

sus algorithms over the last decade: most implementations

of consensus are based on Paxos or inﬂuenced by it, and

Paxos has become the primary vehicle used to teach stu-

dents about consensus.

Unfortunately, Paxos is quite difﬁcult to understand, in

spite of numerous attempts to make it more approachable.

Furthermore, its architecture r equires complex changes

to support practical systems. As a result, both system

builders and students struggle with Paxos.

After struggling with Paxos ourselves, we set out to

ﬁnd a new consensus algorithm that could provide a bet-

ter foundation for system building and education. Our ap-

proach was unusual in that our primary goal was under-

standa bility: could we deﬁne a consensus algorithm for

practical systems and de scribe it in a way that is signiﬁ-

cantly easier to learn than Paxos? Furthermore, we wanted

the algorithm to facilitate the development of intuitions

that are essential for system builders. It was important not

just for the algorithm to work, but for it to be obvious why

it works.

The result of this work is a consensus algorithm called

Raft. In designing Raft we applied speciﬁc techniques to

improve understandability, including decomposition (Raft

separates leader election, log replication, and safety) and

state space redu ction (relative to Paxos, Raft reduces the

degree of nondeterminism and the ways servers can be in-

consistent with each other). A user study with 43 students

at two universities shows that Raft is signiﬁcantly easier

to understand than Paxos: after learning both algorithms,

33 of these students were able to answer questions about

Raft better than questions about Paxos.

Raft is similar in ma ny ways to existing consensus al-

gorithms (most notably, Oki and Liskov’s Viewstamped

Replication [27, 20]), but it has several novel features:

• Strong leader: Raft uses a stronger form of leadership

than other consensus algorithms. For example, log en-

tries only ﬂow from the leader to other servers. This

simpliﬁes the management of the replicated log and

makes Raft easier to u nderstand.

• Leader election: Raft uses rando mized timers to elect

leaders. This adds only a small amount of mechanism

to the heartbeats already required for any consensus al-

gorithm, while resolving conﬂicts simply and rapidly.

• Membership changes: Raft’s mechanism for changing

the set of servers in the cluster uses a new joint consen-

sus approach where the majorities of two different con-

ﬁgurations overlap during transitions. This allows the

cluster to continue operating normally during conﬁgu-

ration changes.

We believe that Raft is superior to Paxos and other con-

sensus algorithms, both for educational purposes and as a

foundation for implementation. It is simpler and more un-

derstandable than other algorithms; it is described com-

pletely enough to meet the needs of a p ractical system;

it has several open-source implementations and is used

by several companies; its safety properties h ave been for-

mally speciﬁed and proven; and its efﬁciency is compara-

ble to o ther algorithm s.

The remainder of the paper introduces the replicated

state machine problem (Section 2), discusses the strengths

and weaknesses of Paxos (Section 3), describes our gen-

eral approach to understandability (Section 4), presents

the Raft consensus algorithm (Sections 5–7), evaluates

Raft (Section 8 ), and discusses related work (Section 9).

A few elements of the Raft algorithm have been omitted

here because of space limitations, but they are available in

an extended technical report [29]. The additional material

describes how clients interac t with the system, and how

space in the Raft log can be reclaimed.

2 Replicated state machines

Consensus algorithms typically arise in the context of

replicated state machines [33]. In this approach, state ma-

chines on a collection of servers compute identical copies

of the same state and can continue operating even if some

of the servers are down. Replicated state machines are

used to solve a variety of fault tolerance problems in dis-

tributed systems. For example, large-scale systems that

资源由 www.eimhe.com 美河学习在线收集分享

of 16

5墨值下载

raft 算法

关注

评论