暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
循环往复-数据库的历史与未来-WhatGoesAround-2005.pdf
327
40页
2次
2024-07-10
5墨值下载
What Goes Around Comes Around
Michael Stonebraker
Joseph M. Hellerstein
Abstract
This paper provides a summary of 35 years of data model proposals, grouped into 9
different eras. We discuss the proposals of each era, and show that there are only a few
basic data modeling ideas, and most have been around a long time. Later proposals
inevitably bear a strong resemblance to certain earlier proposals. Hence, it is a
worthwhile exercise to study previous proposals.
In addition, we present the lessons learned from the exploration of the proposals in each
era. Most current researchers were not around for many of the previous eras, and have
limited (if any) understanding of what was previously learned. There is an old adage that
he who does not understand history is condemned to repeat it. By presenting “ancient
history”, we hope to allow future researchers to avoid replaying history.
Unfortunately, the main proposal in the current XML era bears a striking resemblance to
the CODASYL proposal from the early 1970’s, which failed because of its complexity.
Hence, the current era is replaying history, and “what goes around comes around”.
Hopefully the next era will be smarter.
I Introduction
Data model proposals have been around since the late 1960’s, when the first author
“came on the scene”. Proposals have continued with surprising regularity for the
intervening 35 years. Moreover, many of the current day proposals have come from
researchers too young to have learned from the discussion of earlier ones. Hence, the
purpose of this paper is to summarize 35 years worth of “progress” and point out what
should be learned from this lengthy exercise.
We present data model proposals in nine historical epochs:
Hierarchical (IMS): late 1960’s and 1970’s
Network (CODASYL): 1970’s
Relational: 1970’s and early 1980’s
Entity-Relationship: 1970’s
Extended Relational: 1980’s
Semantic: late 1970’s and 1980’s
Object-oriented: late 1980’s and early 1990’s
Object-relational: late 1980’s and early 1990’s
Semi-structured (XML): late 1990’s to the present
In each case, we discuss the data model and associated query language, using a neutral
notation. Hence, we will spare the reader the idiosyncratic details of the various
proposals. We will also attempt to use a uniform collection of terms, again in an attempt
to limit the confusion that might otherwise occur.
Throughout much of the paper, we will use the standard example of suppliers and parts,
from [CODD70], which we write for now in relational form in Figure 1.
Supplier (sno, sname, scity, sstate)
Part (pno, pname, psize, pcolor)
Supply (sno, pno, qty, price)
A Relational Schema
Figure 1
Here we have Supplier information, Part information and the Supply relationship to
indicate the terms under which a supplier can supply a part.
II IMS Era
IMS was released around 1968, and initially had a hierarchical data model. It understood
the notion of a record type, which is a collection of named fields with their associated
data types. Each instance of a record type is forced to obey the data description
indicated in the definition of the record type. Furthermore, some subset of the named
fields must uniquely specify a record instance, i.e. they are required to be a key. Lastly,
the record types must be arranged in a tree, such that each record type (other than the
root) has a unique parent record type. An IMS data base is a collection of instances of
record types, such that each instance, other than root instances, has a single parent of the
correct record type.
This requirement of tree-structured data presents a challenge for our sample data, because
we are forced to structure it in one of the two ways indicated in Figure 2. These
representations share two common undesirable properties:
1) Information is repeated. In the first schema, Part information is repeated for
each Supplier who supplies the part. In the second schema, Supplier information
is repeated for each part he supplies. Repeated information is undesirable,
because it offers the possibility for inconsistent data. For example, a repeated
data element could be changed in some, but not all, of the places it appears,
leading to an inconsistent data base.
2) Existence depends on parents. In the first schema it is impossible for there to be
a part that is not currently supplied by anybody. In the second schema, it is
impossible to have a supplier which does not currently supply anything. There is
no support for these “corner cases” in a strict hierarchy.
What Goes Around Comes Around 3
of 40
5墨值下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜