Scaling Memcache at Facebook——mcrouter

张凡

339

14页

3次

2021-01-12

免费下载

USENIX Association 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) 385

Scaling Memcache at Facebook

Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li,

Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung,

Venkateshwaran Venkataramani

{rajeshn,hans}@fb.com, {sgrimm, marc}@facebook.com, {herman, hcli, rm, mpal, dpeek, ps, dstaff, ttung, veeve}@fb.com

Facebook Inc.

Abstract: Memcached is a well known, simple, in-

memory caching solution. This paper describes how

Facebook leverages memcached as a building block to

construct and scale a distributed key-value store that

supports the world’s largest social network. Our system

handles billions of requests per second and holds tril-

lions of items to deliver a rich experience for over a bil-

lion users around the world.

1 Introduction

Popular and engaging social networking sites present

signiﬁcant infrastructure challenges. Hundreds of mil-

lions of people use these networks every day and im-

pose computational, network, and I/O demands that tra-

ditional web architectures struggle to satisfy. A social

network’s infrastructure needs to (1) allow near real-

time communication, (2) aggregate content on-the-ﬂy

from multiple sources, (3) be able to access and update

very popular shared content, and (4) scale to process

millions of user requests per second.

We describe how we improved the open source ver-

sion of memcached [14] and used it as a building block to

construct a distributed key-value store for the largest so-

cial network in the world. We discuss our journey scal-

ing from a single cluster of servers to multiple geograph-

ically distributed clusters. To the best of our knowledge,

this system is the largest memcached installation in the

world, processing over a billion requests per second and

storing trillions of items.

This paper is the latest in a series of works that have

recognized the ﬂexibility and utility of distributed key-

value stores [1, 2, 5, 6, 12, 14, 34, 36]. This paper fo-

cuses on memcached—an open-source implementation

of an in-memory hash table—as it provides low latency

access to a shared storage pool at low cost. These quali-

ties enable us to build data-intensive features that would

otherwise be impractical. For example, a feature that

issues hundreds of database queries per page request

would likely never leave the prototype stage because it

would be too slow and expensive. In our application,

however, web pages routinely fetch thousands of key-

value pairs from memcached servers.

One of our goals is to present the important themes

that emerge at different scales of our deployment. While

qualities like performance, efﬁciency, fault-tolerance,

and consistency are important at all scales, our experi-

ence indicates that at speciﬁc sizes some qualities re-

quire more effort to achieve than others. For exam-

ple, maintaining data consistency can be easier at small

scales if replication is minimal compared to larger ones

where replication is often necessary. Additionally, the

importance of ﬁnding an optimal communication sched-

ule increases as the number of servers increase and net-

working becomes the bottleneck.

This paper includes four main contributions: (1)

We describe the evolution of Facebook’s memcached-

based architecture. (2) We identify enhancements to

memcached that improve performance and increase

memory efﬁciency. (3) We highlight mechanisms that

improve our ability to operate our system at scale. (4)

We characterize the production workloads imposed on

our system.

2 Overview

The following properties greatly inﬂuence our design.

First, users consume an order of magnitude more con-

tent than they create. This behavior results in a workload

dominated by fetching data and suggests that caching

can have signiﬁcant advantages. Second, our read op-

erations fetch data from a variety of sources such as

MySQL databases, HDFS installations, and backend

services. This heterogeneity requires a ﬂexible caching

strategy able to store data from disparate sources.

Memcached provides a simple set of operations (set,

get, and delete) that makes it attractive as an elemen-

tal component in a large-scale distributed system. The

open-source version we started with provides a single-

machine in-memory hash table. In this paper, we discuss

how we took this basic building block, made it more ef-

ﬁcient, and used it to build a distributed key-value store

that can process billions of requests per second. Hence-

386 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’13) USENIX Association

database

web

server

memcache

1. get k 2. SELECT ...

3. set (k,v)

database

web

server

memcache

2. delete k

1. UPDATE ...

Figure 1: Memcache as a demand-ﬁlled look-aside

cache. The left half illustrates the read path for a web

server on a cache miss. The right half illustrates the

write path.

forth, we use ‘memcached’ to refer to the source code

or a running binary and ‘memcache’ to describe the dis-

tributed system.

Query cache: We rely on memcache to lighten the read

load on our databases. In particular, we use memcache

as a demand-ﬁlled look-aside cache as shown in Fig-

ure 1. When a web server needs data, it ﬁrst requests

the value from memcache by providing a string key. If

the item addressed by that key is not cached, the web

server retrieves the data from the database or other back-

end service and populates the cache with the key-value

pair. For write requests, the web server issues SQL state-

ments to the database and then sends a delete request to

memcache that invalidates any stale data. We choose to

delete cached data instead of updating it because deletes

are idempotent. Memcache is not the authoritative source

of the data and is therefore allowed to evict cached data.

While there are several ways to address excessive

read trafﬁc on MySQL databases, we chose to use

memcache. It was the best choice given limited engi-

neering resources and time. Additionally, separating our

caching layer from our persistence layer allows us to ad-

just each layer independently as our workload changes.

Generic cache: We also leverage memcache as a more

general key-value store. For example, engineers use

memcache to store pre-computed results from sophisti-

cated machine learning algorithms which can then be

used by a variety of other applications. It takes little ef-

fort for new services to leverage the existing marcher

infrastructure without the burden of tuning, optimizing,

provisioning, and maintaining a large server ﬂeet.

As is, memcached provides no server-to-server co-

ordination; it is an in-memory hash table running on

a single server. In the remainder of this paper we de-

scribe how we built a distributed key-value store based

on memcached capable of operating under Facebook’s

workload. Our system provides a suite of conﬁgu-

ration, aggregation, and routing services to organize

memcached instances into a distributed system.

































Figure 2: Overall architecture

We structure our paper to emphasize the themes that

emerge at three different deployment scales. Our read-

heavy workload and wide fan-out is the primary con-

cern when we have one cluster of servers. As it becomes

necessary to scale to multiple frontend clusters, we ad-

dress data replication between these clusters. Finally, we

describe mechanisms to provide a consistent user ex-

perience as we spread clusters around the world. Op-

erational complexity and fault tolerance is important at

all scales. We present salient data that supports our de-

sign decisions and refer the reader to work by Atikoglu

et al. [8] for a more detailed analysis of our workload. At

a high-level, Figure 2 illustrates this ﬁnal architecture in

which we organize co-located clusters into a region and

designate a master region that provides a data stream to

keep non-master regions up-to-date.

While evolving our system we prioritize two ma-

jor design goals. (1) Any change must impact a user-

facing or operational issue. Optimizations that have lim-

ited scope are rarely considered. (2) We treat the prob-

ability of reading transient stale data as a parameter to

be tuned, similar to responsiveness. We are willing to

expose slightly stale data in exchange for insulating a

backend storage service from excessive load.

3 In a Cluster: Latency and Load

We now consider the challenges of scaling to thousands

of servers within a cluster. At this scale, most of our

efforts focus on reducing either the latency of fetching

cached data or the load imposed due to a cache miss.

3.1 Reducing Latency

Whether a request for data results in a cache hit or miss,

the latency of memcache’s response is a critical factor

in the response time of a user’s request. A single user

web request can often result in hundreds of individual

of 14

免费下载

memcache

关注

评论