循环往复-数据库的历史与未来-WhatGoesAround-2005.pdf

盖国强

327

40页

2次

2024-07-10

5墨值下载

What Goes Around Comes Around

Michael Stonebraker

Joseph M. Hellerstein

Abstract

This paper provides a summary of 35 years of data model proposals, grouped into 9

different eras. We discuss the proposals of each era, and show that there are only a few

basic data modeling ideas, and most have been around a long time. Later proposals

inevitably bear a strong resemblance to certain earlier proposals. Hence, it is a

worthwhile exercise to study previous proposals.

In addition, we present the lessons learned from the exploration of the proposals in each

era. Most current researchers were not around for many of the previous eras, and have

limited (if any) understanding of what was previously learned. There is an old adage that

he who does not understand history is condemned to repeat it. By presenting “ancient

history”, we hope to allow future researchers to avoid replaying history.

Unfortunately, the main proposal in the current XML era bears a striking resemblance to

the CODASYL proposal from the early 1970’s, which failed because of its complexity.

Hence, the current era is replaying history, and “what goes around comes around”.

Hopefully the next era will be smarter.

I Introduction

Data model proposals have been around since the late 1960’s, when the first author

“came on the scene”. Proposals have continued with surprising regularity for the

intervening 35 years. Moreover, many of the current day proposals have come from

researchers too young to have learned from the discussion of earlier ones. Hence, the

purpose of this paper is to summarize 35 years worth of “progress” and point out what

should be learned from this lengthy exercise.

We present data model proposals in nine historical epochs:

Hierarchical (IMS): late 1960’s and 1970’s

Network (CODASYL): 1970’s

Relational: 1970’s and early 1980’s

Entity-Relationship: 1970’s

Extended Relational: 1980’s

Semantic: late 1970’s and 1980’s

Object-oriented: late 1980’s and early 1990’s

Object-relational: late 1980’s and early 1990’s

Semi-structured (XML): late 1990’s to the present

In each case, we discuss the data model and associated query language, using a neutral

notation. Hence, we will spare the reader the idiosyncratic details of the various

proposals. We will also attempt to use a uniform collection of terms, again in an attempt

to limit the confusion that might otherwise occur.

Throughout much of the paper, we will use the standard example of suppliers and parts,

from [CODD70], which we write for now in relational form in Figure 1.

Supplier (sno, sname, scity, sstate)

Part (pno, pname, psize, pcolor)

Supply (sno, pno, qty, price)

A Relational Schema

Figure 1

Here we have Supplier information, Part information and the Supply relationship to

indicate the terms under which a supplier can supply a part.

II IMS Era

IMS was released around 1968, and initially had a hierarchical data model. It understood

the notion of a record type, which is a collection of named fields with their associated

data types. Each instance of a record type is forced to obey the data description

indicated in the definition of the record type. Furthermore, some subset of the named

fields must uniquely specify a record instance, i.e. they are required to be a key. Lastly,

the record types must be arranged in a tree, such that each record type (other than the

root) has a unique parent record type. An IMS data base is a collection of instances of

record types, such that each instance, other than root instances, has a single parent of the

correct record type.

This requirement of tree-structured data presents a challenge for our sample data, because

we are forced to structure it in one of the two ways indicated in Figure 2. These

representations share two common undesirable properties:

1) Information is repeated. In the first schema, Part information is repeated for

each Supplier who supplies the part. In the second schema, Supplier information

is repeated for each part he supplies. Repeated information is undesirable,

because it offers the possibility for inconsistent data. For example, a repeated

data element could be changed in some, but not all, of the places it appears,

leading to an inconsistent data base.

2) Existence depends on parents. In the first schema it is impossible for there to be

a part that is not currently supplied by anybody. In the second schema, it is

impossible to have a supplier which does not currently supply anything. There is

no support for these “corner cases” in a strict hierarchy.

What Goes Around Comes Around 3

of 40

5墨值下载

stonebraker

文档被以下合辑收录

数据库简史（共33篇）

关于我的新书《数据库简史》中引用的一些资料，统一辑录在这里。

关注

文档被以下合辑收录

评论