A Relational Model of Data for Large Shared Data Banks - 关系型数据库奠基论文.pdf

盖国强

1034

11页

19次

2021-01-21

免费下载

Information Retrieval

P. BAXENDALE, Editor

A Relational Model of Data

Large Shared Data Banks

E. F.

CODD

IBM Research Laboratory, San Jose, California

for

Future users of large data banks must be protected from

having to know how the data is organized in the machine (the

internal representation). A prompting service which supplies

such information is not a satisfactory solution. Activities of users

at terminals and most application programs should remain

unaffected when the internal representation of data is changed

and even when some aspects of the external representation

are changed. Changes in data representation will often be

needed as a result of changes in query, update, and report

traffic and natural growth in the types of stored information.

Existing noninferential, formatted data systems provide users

with tree-structured files or slightly more general network

models of the data. In Section 1, inadequacies of these models

are discussed. A model based on n-ary relations, a normal

form for data base relations, and the concept of a universal

data sublanguage are introduced. In Section 2, certain opera-

tions on relations (other than logical inference) are discussed

and applied to the problems of redundancy and consistency

in the user's model.

KEY WORDS AND PHRASES:

data bank, data base, data structure, data

organization, hierarchies of data, networks of data, relations, derivability,

redundancy, consistency, composition, join, retrieval language, predicate

calculus, security, data integrity

CATEGORIES: 3.70, 3.73, 3.75, 4.20, 4.22, 4.29

1. Relational Model and Normal Form

1.1. INTRODUCTION

This paper is concerned with the application of ele-

mentary relation theory to systems which provide shared

access to large banks of formatted data. Except for a paper

by Childs [1], the principal application of relations to data

systems has been to deductive question-answering systems.

Levein and Maron [2] provide numerous references to work

in this area.

In contrast, the problems treated here are those of

data

independence--the

independence of application programs

and terminal activities from growth in data types and

changes in data representation--and certain kinds of

data

inconsistency

which are expected to become troublesome

even in nondeductive systems.

Volume

13 / Number 6 / June, 1970

The relational view (or model) of data described in

Section 1 appears to be superior in several respects to the

graph or network model [3, 4] presently in vogue for non-

inferential systems. It provides a means of describing data

with its natural structure only--that is, without superim-

posing any additional structure for machine representation

purposes. Accordingly, it provides a basis for a high level

data language which will yield maximal independence be-

tween programs on the one hand and machine representa-

tion and organization of data on the other.

A further advantage of the relational view is that it

forms a sound basis for treating derivability, redundancy,

and consistency of relations--these are discussed in Section

2. The network model, on the other hand, has spawned a

number of confusions, not the least of which is mistaking

the derivation of connections for the derivation of rela-

tions (see remarks in Section 2 on the

"connection

trap").

Finally, the relational view permits a clearer evaluation

of the scope and logical limitations of present formatted

data systems, and also the relative merits (from a logical

standpoint) of competing representations of data within a

single system. Examples of this clearer perspective are

cited in various parts of this paper. Implementations of

systems to support the relational model are not discussed.

1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS

The provision of data description tables in recently de-

veloped information systems represents a major advance

toward the goal of data independence [5, 6, 7]. Such tables

facilitate changing certain characteristics of the data repre-

sentation stored in a data bank. However, the variety of

data representation characteristics which can be changed

without logically impairing some application programs

still quite limited. Further, the model of data with which

users interact is still cluttered with representational prop-

erties, particularly in regard to the representation of col-

lections of data (as opposed to individual items). Three of

the principal kinds of data dependencies which still need

to be removed are: ordering dependence, indexing depend-

ence, and access path dependence. In some systems these

dependencies are not clearly separable from one another.

1.2.1.

Ordering Dependence.

Elements of data in a

data bank may be stored in a variety of ways, some involv-

ing no concern for ordering, some permitting each element

to participate in one ordering only, others permitting each

element to participate in several orderings. Let us consider

those existing systems which either require or permit data

elements to be stored in at least one total ordering which is

closely associated with the hardware-determined ordering

of addresses. For example, the records of a file concerning

parts might be stored in ascending order by part serial

number. Such systems normally permit application pro-

grams to assume that the order of presentation of records

from such a file is identical to (or is a subordering of) the

Communications of the

ACM 377

stored ordering. Those application programs which take

advantage of the stored ordering of a file are likely to fail

to operate correctly if for some reason it becomes necessary

to replace that ordering by a different one. Similar remarks

hold for a stored ordering implemented by means of

pointers.

It is unnecessary to single out any system as an example,

because all the well-known information systems that are

marketed today fail to make a clear distinction between

order of presentation on the one hand and stored ordering

on the other. Significant implementation problems must be

solved to provide this kind of independence.

1.2.2. Indexing Dependence.

In the context of for-

matted data, an index is usually thought of as a purely

performance-oriented component of the data representa-

tion. It tends to improve response to queries and updates

and, at the same time, slow down response to insertions

and deletions. From an informational standpoint, an index

is a redundant component of the data representation. If a

system uses indices at all and if it is to perform well in an

environment with changing patterns of activity on the data

bank, an ability to create and destroy indices from time to

time will probably be necessary. The question then arises:

Can application programs and terminal activities remain

invariant as indices come and go?

Present formatted data systems take widely different

approaches to indexing. TDMS [7] unconditionally pro-

vides indexing on all attributes. The presently released

version of IMS [5] provides the user with a choice for each

file: a choice between no indexing at all (the hierarchic se-

quential organization) or indexing on the primary key

only (the hierarchic indexed sequential organization). In

neither case is the user's application logic dependent on the

existence of the unconditionally provided indices. IDS

[8], however, permits the file designers to select attributes

to be indexed and to incorporate indices into the file struc-

ture by means of additional chains. Application programs

talcing advantage of the performance benefit of these in-

dexing chains must refer to those chains by name. Such pro-

grams do not operate correctly if these chains are later

removed.

1.2.3.

Access Path Dependence.

Many of the existing

formatted data systems provide users with tree-structured

files or slightly more general network models of the data.

Application programs developed to work with these sys-

tems tend to be logically impaired if the trees or networks

are changed in structure. A simple example follows.

Suppose the data bank contains information about parts

and projects. For each part, the part number, part name,

part description, quantity-on-hand, and quantity-on-order

are recorded. For each project, the project number, project

name, project description are recorded. Whenever a project

makes use of a certain part, the quantity of that part com-

mitted to the given project is also recorded. Suppose that

the system requires the user or file designer to declare or

define the data in terms of tree structures. Then, any one

of the hierarchical structures may be adopted for the infor-

mation mentioned above (see Structures 1-5).

Structure 1.

File Segment

F PART

PROJECT

Projects Subordinate to Parts

Pldds

part #

part name

part description

quantity-on-hand

quantity-on-order

project #

project name

project description

quantity committed

Structure 2.

File Segment

F PROJECT

PART

Parts Subordinate to Projects

gidds

project #

project name

project description

part #

part name

part description

quantity-on-hand

quantity-on-order

quantity committed

Structure 3. Parts and Projects as Peers

Commitment Relationship Subordinate to Projects

File Segmen~ Fields

F PART part #

part name

part description

quantity-on-hand

quantity-on-order

G PROJECT project #

project name

project description

PART part #

quantity committed

Structure 4. Parts and Projects as Peers

Commitment Relationship Subordinate to Parts

File Segment Fidds

PART

part

part description

quantity-on-hand

quantity-on-order

PROJECT project #

quantity committed

G PROJECT project #

project name

project description

Structure 5. Parts, Projects, and

Commitment Relationship as Peers

F~¢ Segment Fields

F PART part #

part name

part description

quantity-on-hand

quantity-on-order

G PROJECT project #

project name

project description

H COMMIT part #

project #

quantity committed

378

Communications of the ACM Volume 13 / Number 6 / June, 1970

of 11

免费下载

oracle edgar frank codd

文档被以下合辑收录

数据库简史（共17篇）

关于我的新书《数据库简史》中引用的一些资料，统一辑录在这里。

数据库 | 经典论文（共9篇）

数据库领域里，明珠一样的创世论文。

精品研究报告/论文推荐（共44篇）

精选高品质研究报告和具有里程碑意义的论文供大家参考、学习，持续更新……

关注

文档被以下合辑收录

评论