A Relational Model of Data for Large Shared Data Banks
E. F. Codd
∗
IBM Research Laboratory, San Jose, California
ABSTRACT
Future users of large data banks must be protected from
having to know how the data is organized in the machine
(the internal representation). A prompting service which
supplies such information is not a satisfactory solution. Ac-
tivities of users at terminals and most application programs
should remain unaffected when the internal representation of
data is changed and even when some aspects of the external
representation are changed. Changes in data representation
will often be needed as a result of changes in query, update,
and report traffic and natural growth in the types of stored
information.
Existing noninferential, formatted data systems provide
users with tree-structured files or slightly more general net-
work models of the data. In Section 1, inadequacies of these
models are discussed. A model based on n-ary relations,
a normal form for data base relations, and the concept of
a universal data sublanguage are introduced. In Section 2,
certain operations on relations (other than logical inference)
are discussed and applied to the problems of redundancy
and consistency in the user’s model.
1. RELATIONAL MODEL AND NORMAL
FORM
1.1 Introduction
This paper is concerned with the application of elementary
relation theory to systems which provide shared access to
large banks of formatted data. Except for a paper by Childs
[1], the principal application of relations to data systems has
been to deductive question-answering systems. Levein and
Maron [2] provide numerous references to work in this area.
In contrast, the problems treated here are those of data in-
dependence—the independence of application programs and
terminal activities from growth in data types and changes
in data representation—and certain kinds of data inconsis-
tency which are expected to become troublesome even in
nondeductive systems.
The relational view (or model) of data described in Sec-
tion 1 appears to be superior in several respects to the graph
or network model [3, 4] presently in vogue for noninferential
systems. It provides a means of describing data with its nat-
ural structure only—that is, without superimposing any ad-
ditional structure for machine representation purposes. Ac-
cordingly, it provides a basis for a high level data language
∗
E. F. Codd. 1970. A relational model of data
for large shared data banks. Commun. ACM 13,
6 (June 1970), 377-387. DOI=10.1145/362384.362685
http://doi.acm.org/10.1145/362384.362685
which will yield maximal independence between programs
on the one hand and machine representation and organiza-
tion of data on the other.
A further advantage of the relational view is that it forms
a sound basis for treating derivability, redundancy, and con-
sistency of relations—these are discussed in Section 2. The
network model, on the other hand, has spawned a number of
confusions, not the least of which is mistaking the derivation
of connections for the derivation of relations (see remarks in
Section 2 on the “connection trap”).
Finally, the relational view permits a clearer evaluation of
the scope and logical limitations of present formatted data
systems, and also the relative merits (from a logical stand-
point) of competing representations of data within a single
system. Examples of this clearer perspective are cited in
various parts of this paper. Implementations of systems to
support the relational model are not discussed.
1.2 Data Dependencies in Present Systems
The provision of data description tables in recently de-
veloped information systems represents a major advance to-
ward the goal of data independence [5, 6, 7]. Such tables
facilitate changing certain characteristics of the data rep-
resentation stored in a data bank. However, the variety
of data representation characteristics which can be changed
without logically impairing some application programs is still
quite limited. Further, the model of data with which users
interact is still cluttered with representational properties,
particularly in regard to the representation of collections of
data (as opposed to individual items). Three of the principal
kinds of data dependencies which still need to be removed
are: ordering dependence, indexing dependence, and access
path dependence. In some systems these dependencies are
not clearly separable from one another.
1.2.1 Ordering Dependence
Elements of data in a data bank may be stored in a va-
riety of ways, some involving no concern for ordering, some
permitting each element to participate in one ordering only,
others permitting each element to participate in several or-
derings. Let us consider those existing systems which either
require or permit data elements to be stored in at least one
total ordering which is closely associated with the hardware-
determined ordering of addresses. For example, the records
of a file concerning parts might be stored in ascending order
by part serial number. Such systems normally permit appli-
cation programs to assume that the order of presentation of
records from such a file is identical to (or is a subordering
of) the stored ordering. Those application programs which
take advantage of the stored ordering of a file are likely to
文档被以下合辑收录
评论