Application
Percolator Library
Bigtable Tabletserver
Chunkserver
RPC
Figure 1: Percolator and its dependencies
dividually, avoiding the global scans of the repository
that MapReduce requires. To achieve high throughput,
many threads on many machines need to transform the
repository concurrently, so Percolator provides ACID-
compliant transactions to make it easier for programmers
to reason about the state of the repository; we currently
implement snapshot isolation semantics [5].
In addition to reasoning about concurrency, program-
mers of an incremental system need to keep track of the
state of the incremental computation. To assist them in
this task, Percolator provides observers: pieces of code
that are invoked by the system whenever a user-specified
column changes. Percolator applications are structured
as a series of observers; each observer completes a task
and creates more work for “downstream” observers by
writing to the table. An external process triggers the first
observer in the chain by writing initial data into the table.
Percolator was built specifically for incremental pro-
cessing and is not intended to supplant existing solutions
for most data processing tasks. Computations where the
result can’t be broken down into small updates (sorting
a file, for example) are better handled by MapReduce.
Also, the computation should have strong consistency
requirements; otherwise, Bigtable is sufficient. Finally,
the computation should be very large in some dimen-
sion (total data size, CPU required for transformation,
etc.); smaller computations not suited to MapReduce or
Bigtable can be handled by traditional DBMSs.
Within Google, the primary application of Percola-
tor is preparing web pages for inclusion in the live web
search index. By converting the indexing system to an
incremental system, we are able to process individual
documents as they are crawled. This reduced the aver-
age document processing latency by a factor of 100, and
the average age of a document appearing in a search re-
sult dropped by nearly 50 percent (the age of a search re-
sult includes delays other than indexing such as the time
between a document being changed and being crawled).
The system has also been used to render pages into
images; Percolator tracks the relationship between web
pages and the resources they depend on, so pages can be
reprocessed when any depended-upon resources change.
2 Design
Percolator provides two main abstractions for per-
forming incremental processing at large scale: ACID
transactions over a random-access repository and ob-
servers, a way to organize an incremental computation.
A Percolator system consists of three binaries that run
on every machine in the cluster: a Percolator worker, a
Bigtable [9] tablet server, and a GFS [20] chunkserver.
All observers are linked into the Percolator worker,
which scans the Bigtable for changed columns (“noti-
fications”) and invokes the corresponding observers as
a function call in the worker process. The observers
perform transactions by sending read/write RPCs to
Bigtable tablet servers, which in turn send read/write
RPCs to GFS chunkservers. The system also depends
on two small services: the timestamp oracle and the
lightweight lock service. The timestamp oracle pro-
vides strictly increasing timestamps: a property required
for correct operation of the snapshot isolation protocol.
Workers use the lightweight lock service to make the
search for dirty notifications more efficient.
From the programmer’s perspective, a Percolator
repository consists of a small number of tables. Each
table is a collection of “cells” indexed by row and col-
umn. Each cell contains a value: an uninterpreted array of
bytes. (Internally, to support snapshot isolation, we rep-
resent each cell as a series of values indexed by times-
tamp.)
The design of Percolator was influenced by the re-
quirement to run at massive scales and the lack of a
requirement for extremely low latency. Relaxed latency
requirements let us take, for example, a lazy approach
to cleaning up locks left behind by transactions running
on failed machines. This lazy, simple-to-implement ap-
proach potentially delays transaction commit by tens of
seconds. This delay would not be acceptable in a DBMS
running OLTP tasks, but it is tolerable in an incremental
processing system building an index of the web. Percola-
tor has no central location for transaction management;
in particular, it lacks a global deadlock detector. This in-
creases the latency of conflicting transactions but allows
the system to scale to thousands of machines.
2.1 Bigtable overview
Percolator is built on top of the Bigtable distributed
storage system. Bigtable presents a multi-dimensional
sorted map to users: keys are (row, column, times-
tamp) tuples. Bigtable provides lookup and update oper-
ations on each row, and Bigtable row transactions enable
atomic read-modify-write operations on individual rows.
Bigtable handles petabytes of data and runs reliably on
large numbers of (unreliable) machines.
A running Bigtable consists of a collection of tablet
servers, each of which is responsible for serving several
tablets (contiguous regions of the key space). A master
coordinates the operation of tablet servers by, for exam-
ple, directing them to load or unload tablets. A tablet is
stored as a collection of read-only files in the Google
2
评论