A Relational Model of Data for Large Shared Data Banks.pdf

chirpyli

151

10页

3次

2022-11-07

免费下载

A Relational Model of Data for Large Shared Data Banks

E. F. Codd

∗

IBM Research Laboratory, San Jose, California

ABSTRACT

Future users of large data banks must be protected from

having to know how the data is organized in the machine

(the internal representation). A prompting service which

supplies such information is not a satisfactory solution. Ac-

tivities of users at terminals and most application programs

should remain unaﬀected when the internal representation of

data is changed and even when some aspects of the external

representation are changed. Changes in data representation

will often be needed as a result of changes in query, update,

and report traﬃc and natural growth in the types of stored

information.

Existing noninferential, formatted data systems provide

users with tree-structured ﬁles or slightly more general net-

work models of the data. In Section 1, inadequacies of these

models are discussed. A model based on n-ary relations,

a normal form for data base relations, and the concept of

a universal data sublanguage are introduced. In Section 2,

certain operations on relations (other than logical inference)

are discussed and applied to the problems of redundancy

and consistency in the user’s model.

1. RELATIONAL MODEL AND NORMAL

FORM

1.1 Introduction

This paper is concerned with the application of elementary

relation theory to systems which provide shared access to

large banks of formatted data. Except for a paper by Childs

[1], the principal application of relations to data systems has

been to deductive question-answering systems. Levein and

Maron [2] provide numerous references to work in this area.

In contrast, the problems treated here are those of data in-

dependence—the independence of application programs and

terminal activities from growth in data types and changes

in data representation—and certain kinds of data inconsis-

tency which are expected to become troublesome even in

nondeductive systems.

The relational view (or model) of data described in Sec-

tion 1 appears to be superior in several respects to the graph

or network model [3, 4] presently in vogue for noninferential

systems. It provides a means of describing data with its nat-

ural structure only—that is, without superimposing any ad-

ditional structure for machine representation purposes. Ac-

cordingly, it provides a basis for a high level data language

∗

E. F. Codd. 1970. A relational model of data

for large shared data banks. Commun. ACM 13,

6 (June 1970), 377-387. DOI=10.1145/362384.362685

http://doi.acm.org/10.1145/362384.362685

which will yield maximal independence between programs

on the one hand and machine representation and organiza-

tion of data on the other.

A further advantage of the relational view is that it forms

a sound basis for treating derivability, redundancy, and con-

sistency of relations—these are discussed in Section 2. The

network model, on the other hand, has spawned a number of

confusions, not the least of which is mistaking the derivation

of connections for the derivation of relations (see remarks in

Section 2 on the “connection trap”).

Finally, the relational view permits a clearer evaluation of

the scope and logical limitations of present formatted data

systems, and also the relative merits (from a logical stand-

point) of competing representations of data within a single

system. Examples of this clearer perspective are cited in

various parts of this paper. Implementations of systems to

support the relational model are not discussed.

1.2 Data Dependencies in Present Systems

The provision of data description tables in recently de-

veloped information systems represents a major advance to-

ward the goal of data independence [5, 6, 7]. Such tables

facilitate changing certain characteristics of the data rep-

resentation stored in a data bank. However, the variety

of data representation characteristics which can be changed

without logically impairing some application programs is still

quite limited. Further, the model of data with which users

interact is still cluttered with representational properties,

particularly in regard to the representation of collections of

data (as opposed to individual items). Three of the principal

kinds of data dependencies which still need to be removed

are: ordering dependence, indexing dependence, and access

path dependence. In some systems these dependencies are

not clearly separable from one another.

1.2.1 Ordering Dependence

Elements of data in a data bank may be stored in a va-

riety of ways, some involving no concern for ordering, some

permitting each element to participate in one ordering only,

others permitting each element to participate in several or-

derings. Let us consider those existing systems which either

require or permit data elements to be stored in at least one

total ordering which is closely associated with the hardware-

determined ordering of addresses. For example, the records

of a ﬁle concerning parts might be stored in ascending order

by part serial number. Such systems normally permit appli-

cation programs to assume that the order of presentation of

records from such a ﬁle is identical to (or is a subordering

of) the stored ordering. Those application programs which

take advantage of the stored ordering of a ﬁle are likely to

fail to operate correctly if for some reason it becomes nec-

essary to replace that ordering by a diﬀerent one. Similar

remarks hold for a stored ordering implemented by means

of pointers.

It is unnecessary to single out any system as an exam-

ple, because all the well-known information systems that

are marketed today fail to make a clear distinction between

order of presentation on the one hand and stored ordering

on the other. Signiﬁcant implementation problems must be

solved to provide this kind of independence.

1.2.2 Indexing Dependence

In the context of formatted data, an index is usually

thought of as a purely performance-oriented component of

the data representation. It tends to improve response to

queries and updates and, at the same time, slow down re-

sponse to insertions and deletions. From an informational

standpoint, an index is a redundant component of the data

representation. If a system uses indices at all and if it is to

perform well in an environment with changing patterns of

activity on the data bank, an ability to create and destroy

indices from time to time will probably be necessary. The

question then arises: Can application programs and termi-

nal activities remain invariant as indices come and go?

Present formatted data systems take widely diﬀerent ap-

proaches to indexing. TDMS [7] unconditionally provides

indexing on all attributes. The presently released version

of IMS [5] provides the user with a choice for each ﬁle: a

choice between no indexing at all (the hierarchic sequential

organization) or indexing on the primary key only (the hi-

erarchic indexed sequential organization). In neither case is

the user’s application logic dependent on the existence of the

unconditionally provided indices. IDS [8], however, permits

the ﬁle designers to select attributes to be indexed and to

incorporate indices into the ﬁle structure by means of ad-

ditional chains. Application programs taking advantage of

the performance beneﬁt of these indexing chains must refer

to those chains by name. Such programs do not operate

correctly if these chains are later removed.

1.2.3 Access Path Dependence

Many of the existing formatted data systems provide users

with tree-structured ﬁles or slightly more general network

models of the data. Application programs developed to work

with these systems tend to be logically impaired if the trees

or networks are changed in structure. A simple example

follows.

Suppose the data bank contains information about parts

and projects. For each part, the part number, part name,

part description, quantity-on-hand, and quantity-on-order

are recorded. For each project, the project number, project

name, project description are recorded. Whenever a project

makes use of a certain part, the quantity of that part com-

mitted to the given project is also recorded. Suppose that

the system requires the user or ﬁle designer to declare or

deﬁne the data in terms of tree structures. Then, any one

of the hierarchical structures may be adopted for the infor-

mation mentioned above (see Structures l–5).

Now, consider the problem of printing out the part num-

ber, part name, and quantity committed for every part used

in the project whose project name is “alpha.” The follow-

ing observations may be made regardless of which available

tree-oriented information system is selected to tackle this

problem. If a program P is developed for this problem as-

suming one of the ﬁve structures above—that is, P makes

Structure 1. Projects Subordinate to Parts

File Segment Fields

F PART part #

part name

part description

quantity-on-hand

quantity-on-order

PROJECT project #

project name

project description

quantity committed

Structure 2. Parts Subordinate to Projects

File Segment Fields

F PROJECT project #

project name

project description

PART part #

part name

part description

quantity-on-hand

quantity-on-order

quantity committed

Structure 3. Parts and Projects as Peers

Commitment Relationship Subordinate to Projects

File Segment Fields

F PART part #

part name

part description

quantity-on-hand

quantity-on-order

G PROJECT project #

project name

project description

PART part #

quantity committed

Structure 4. Parts and Projects as Peers

Commitment Relationship Subordinate to Parts

File Segment Fields

F PART part #

part description

quantity-on-hand

quantity-on-order

PROJECT project #

quantity committed

G PROJECT project #

project name

project description

Structure 5. Parts, Projects, and Commitment

Relationship as Peers

File Segment Fields

F PART part #

part name

part description

quantity-on-hand

quantity-on-order

G PROJECT project #

project name

project description

H COMMIT part #

project #

quantity committed

of 10

免费下载

数据库关系模型

文档被以下合辑收录

数据库论文（共6篇）

收录数据库领域相关论文

关注

文档被以下合辑收录

评论