
The Revolution in Database Architecture
Jim Gray
Microsoft
455 Market St. #1650
San Francisco, CA, 94105 USA
http://research.microsoft.com/~Gray
Gray@Microsoft.com
ABSTRACT
Database system architectures are undergoing revolutionary
changes. Most importantly, algorithms and data are being unified
by integrating programming languages with the database system.
This gives an extensible object-relational system where non-
procedural relational operators manipulate object sets. Coupled
with this, each DBMS is now a web service. This has huge impli-
cations for how we structure applications. DBMSs are now object
containers. Queues are the first objects to be added. These queues
are the basis for transaction processing and workflow applica-
tions. Future workflow systems are likely to be built on this core.
Data cubes and online analytic processing are now baked into
most DBMSs. Beyond that, DBMSs have a framework for data
mining and machine learning algorithms. Decision trees, Bayes
nets, clustering, and time series analysis are built in; new algo-
rithms can be added. There is a rebirth of column stores for sparse
tables and to optimize bandwidth. Text, temporal, and spatial
data access methods, along with their probabilistic reasoning have
been added to database systems. Allowing approximate and prob-
abilistic answers is essential for many applications. Many believe
that XML and xQuery will be the main data structure and access
pattern. Database systems must accommodate that perspective.
External data increasingly arrives as streams to be compared to
historical data; so stream-processing operators are being added to
the DBMS. Publish-subscribe systems invert the data-query ra-
tios; incoming data is compared against millions of queries rather
than queries searching millions of records. Meanwhile, disk and
memory capacities are growing much faster than their bandwidth
and latency, so the database systems increasingly use huge main
memories and sequential disk access. These changes mandate a
much more dynamic query optimization strategy – one that adapts
to current conditions and selectivities rather than having a static
plan. Intelligence is moving to the periphery of the network.
Each disk and each sensor will be a competent database machine.
Relational algebra is a convenient way to program these systems.
Database systems are now expected to be self-managing, self-
healing, and always-up. We researchers and developers have our
work cut out for us in delivering all these features.
1. INTRODUCTION
This is an extended abstract for a SIGMOD 2004 keynote address.
It argues that databases are emerging from a period of relative
stasis where the agenda was “implement SQL better.” Now data-
base architectures are in the punctuated stage of punctuated-
equilibrium. They have become the vehicles to deliver an inte-
grated application development environment, to be data-rich
nodes of the Internet, to do data discovery, and to be self-
managing. They are also our main hope to deal with the informa-
tion avalanche hitting individuals, organizations, and all aspects
of human organization. It is an exciting time! There are many
exciting new research problems and many challenging implemen-
tation problems. This talk highlights some of them.
2. THE REVOLUTIONS
2.1 Object Relational Arrives
We be data. But, you cannot separate data and algorithms. Un-
fortunately, Cobol has a data division and a procedure division
and so it had separate committees to define each one. The data-
base community inherited that artificial division from the Cobol
Data Base Task Group (DBTG). We were separated from our
procedural twin at birth. We have been trying to reunite with it
for 40 years now. In the mid-eighties stored procedures were
added to SQL (thank you Sybase), and there was a proliferation of
object-relational database systems. In the mid-nineties many SQL
vendors added objects to their own systems. Although these were
each good efforts, they were fundamentally flawed because de
novo language designs are very high risk.
The object-oriented language community has been refining its
ideas since Simula67. There are now several good OO languages
with excellent implementations and development environments
(Java and C# for example.) There is a common language runtime
that supports nearly all languages with good performance.
The big news now is the marriage of databases and these lan-
guages. The runtimes are being added to the database engine so
that now one can write database stored-procedures (modules) in
these languages and can define database objects as classes in these
languages. Database data can be encapsulated in classes and the
language development environment allows you to program and
debug SQL seamlessly mixing Java or C# with SQL, doing ver-
sion control on the programs, and generally providing a very pro-
ductive programming environment. SQLJ is a very nice integra-
tion of SQL and Java, but there are even better ideas in the pipe-
line.
This integration of languages with databases eliminates the inside-
the-database outside-the-database dichotomy that we have lived
with for the last 40 years. Now fields are objects (values or refer-
ences); records are vectors of objects (fields); and tables are se-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
SIGMOD 2004, June 13–18, 2004, Paris, France.
Copyright 2004 ACM 1-58113-859-8/04/06 …$5.00.
评论