暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
A Relational Model of Data for Large Shared Data Banks - 关系型数据库奠基论文.pdf
1034
11页
19次
2021-01-21
免费下载
Information Retrieval
P. BAXENDALE, Editor
A Relational Model of Data
Large Shared Data Banks
E. F.
CODD
IBM Research Laboratory, San Jose, California
for
Future users of large data banks must be protected from
having to know how the data is organized in the machine (the
internal representation). A prompting service which supplies
such information is not a satisfactory solution. Activities of users
at terminals and most application programs should remain
unaffected when the internal representation of data is changed
and even when some aspects of the external representation
are changed. Changes in data representation will often be
needed as a result of changes in query, update, and report
traffic and natural growth in the types of stored information.
Existing noninferential, formatted data systems provide users
with tree-structured files or slightly more general network
models of the data. In Section 1, inadequacies of these models
are discussed. A model based on n-ary relations, a normal
form for data base relations, and the concept of a universal
data sublanguage are introduced. In Section 2, certain opera-
tions on relations (other than logical inference) are discussed
and applied to the problems of redundancy and consistency
in the user's model.
KEY WORDS AND PHRASES:
data bank, data base, data structure, data
organization, hierarchies of data, networks of data, relations, derivability,
redundancy, consistency, composition, join, retrieval language, predicate
calculus, security, data integrity
CR
CATEGORIES: 3.70, 3.73, 3.75, 4.20, 4.22, 4.29
1. Relational Model and Normal Form
1.1. INTRODUCTION
This paper is concerned with the application of ele-
mentary relation theory to systems which provide shared
access to large banks of formatted data. Except for a paper
by Childs [1], the principal application of relations to data
systems has been to deductive question-answering systems.
Levein and Maron [2] provide numerous references to work
in this area.
In contrast, the problems treated here are those of
data
independence--the
independence of application programs
and terminal activities from growth in data types and
changes in data representation--and certain kinds of
data
inconsistency
which are expected to become troublesome
even in nondeductive systems.
Volume
13 / Number 6 / June, 1970
The relational view (or model) of data described in
Section 1 appears to be superior in several respects to the
graph or network model [3, 4] presently in vogue for non-
inferential systems. It provides a means of describing data
with its natural structure only--that is, without superim-
posing any additional structure for machine representation
purposes. Accordingly, it provides a basis for a high level
data language which will yield maximal independence be-
tween programs on the one hand and machine representa-
tion and organization of data on the other.
A further advantage of the relational view is that it
forms a sound basis for treating derivability, redundancy,
and consistency of relations--these are discussed in Section
2. The network model, on the other hand, has spawned a
number of confusions, not the least of which is mistaking
the derivation of connections for the derivation of rela-
tions (see remarks in Section 2 on the
"connection
trap").
Finally, the relational view permits a clearer evaluation
of the scope and logical limitations of present formatted
data systems, and also the relative merits (from a logical
standpoint) of competing representations of data within a
single system. Examples of this clearer perspective are
cited in various parts of this paper. Implementations of
systems to support the relational model are not discussed.
1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS
The provision of data description tables in recently de-
veloped information systems represents a major advance
toward the goal of data independence [5, 6, 7]. Such tables
facilitate changing certain characteristics of the data repre-
sentation stored in a data bank. However, the variety of
data representation characteristics which can be changed
without logically impairing some application programs
is
still quite limited. Further, the model of data with which
users interact is still cluttered with representational prop-
erties, particularly in regard to the representation of col-
lections of data (as opposed to individual items). Three of
the principal kinds of data dependencies which still need
to be removed are: ordering dependence, indexing depend-
ence, and access path dependence. In some systems these
dependencies are not clearly separable from one another.
1.2.1.
Ordering Dependence.
Elements of data in a
data bank may be stored in a variety of ways, some involv-
ing no concern for ordering, some permitting each element
to participate in one ordering only, others permitting each
element to participate in several orderings. Let us consider
those existing systems which either require or permit data
elements to be stored in at least one total ordering which is
closely associated with the hardware-determined ordering
of addresses. For example, the records of a file concerning
parts might be stored in ascending order by part serial
number. Such systems normally permit application pro-
grams to assume that the order of presentation of records
from such a file is identical to (or is a subordering of) the
Communications of the
ACM 377
stored ordering. Those application programs which take
advantage of the stored ordering of a file are likely to fail
to operate correctly if for some reason it becomes necessary
to replace that ordering by a different one. Similar remarks
hold for a stored ordering implemented by means of
pointers.
It is unnecessary to single out any system as an example,
because all the well-known information systems that are
marketed today fail to make a clear distinction between
order of presentation on the one hand and stored ordering
on the other. Significant implementation problems must be
solved to provide this kind of independence.
1.2.2. Indexing Dependence.
In the context of for-
matted data, an index is usually thought of as a purely
performance-oriented component of the data representa-
tion. It tends to improve response to queries and updates
and, at the same time, slow down response to insertions
and deletions. From an informational standpoint, an index
is a redundant component of the data representation. If a
system uses indices at all and if it is to perform well in an
environment with changing patterns of activity on the data
bank, an ability to create and destroy indices from time to
time will probably be necessary. The question then arises:
Can application programs and terminal activities remain
invariant as indices come and go?
Present formatted data systems take widely different
approaches to indexing. TDMS [7] unconditionally pro-
vides indexing on all attributes. The presently released
version of IMS [5] provides the user with a choice for each
file: a choice between no indexing at all (the hierarchic se-
quential organization) or indexing on the primary key
only (the hierarchic indexed sequential organization). In
neither case is the user's application logic dependent on the
existence of the unconditionally provided indices. IDS
[8], however, permits the file designers to select attributes
to be indexed and to incorporate indices into the file struc-
ture by means of additional chains. Application programs
talcing advantage of the performance benefit of these in-
dexing chains must refer to those chains by name. Such pro-
grams do not operate correctly if these chains are later
removed.
1.2.3.
Access Path Dependence.
Many of the existing
formatted data systems provide users with tree-structured
files or slightly more general network models of the data.
Application programs developed to work with these sys-
tems tend to be logically impaired if the trees or networks
are changed in structure. A simple example follows.
Suppose the data bank contains information about parts
and projects. For each part, the part number, part name,
part description, quantity-on-hand, and quantity-on-order
are recorded. For each project, the project number, project
name, project description are recorded. Whenever a project
makes use of a certain part, the quantity of that part com-
mitted to the given project is also recorded. Suppose that
the system requires the user or file designer to declare or
define the data in terms of tree structures. Then, any one
of the hierarchical structures may be adopted for the infor-
mation mentioned above (see Structures 1-5).
Structure 1.
File Segment
F PART
PROJECT
Projects Subordinate to Parts
Pldds
part #
part name
part description
quantity-on-hand
quantity-on-order
project #
project name
project description
quantity committed
Structure 2.
File Segment
F PROJECT
PART
Parts Subordinate to Projects
gidds
project #
project name
project description
part #
part name
part description
quantity-on-hand
quantity-on-order
quantity committed
Structure 3. Parts and Projects as Peers
Commitment Relationship Subordinate to Projects
File Segmen~ Fields
F PART part #
part name
part description
quantity-on-hand
quantity-on-order
G PROJECT project #
project name
project description
PART part #
quantity committed
Structure 4. Parts and Projects as Peers
Commitment Relationship Subordinate to Parts
File Segment Fidds
P
PART
part
#
part description
quantity-on-hand
quantity-on-order
PROJECT project #
quantity committed
G PROJECT project #
project name
project description
Structure 5. Parts, Projects, and
Commitment Relationship as Peers
F~¢ Segment Fields
F PART part #
part name
part description
quantity-on-hand
quantity-on-order
G PROJECT project #
project name
project description
H COMMIT part #
project #
quantity committed
378
Communications of the ACM Volume 13 / Number 6 / June, 1970
of 11
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜