– MedMuv

Home

June 21, 2024

Solving problems on the example of

the use of pharmacy pharmacy

database “Pharmacy”.Software to

create presentations and office basics

of programming.

This page intentionally left blank

chapter

Data Modeling Using the

Entity-Relationship (ER) Model

onceptual modeling is a very important phase in

designing a successful database application.

Generally, the term database application refers to a particular database and the

associated programs that implement the database queries and updates. For exam-

ple, a BANK database application that keeps track of customer accounts would

include programs that implement database updates corresponding to customer

deposits and withdrawals. These programs provide user-friendly graphical user

interfaces (GUIs) utilizing forms and menus for the end users of the application—

the bank tellers, in this example. Hence, a major part of the database application will

require the design, implementation, and testing of these application programs.

Traditionally, the design and testing of application programs has been considered

to be part of software engineering rather than database design. In many software

design tools, the database design methodologies and software engineering method-

ologies are intertwined since these activities are strongly related.

In this chapter, we follow the traditional approach of concentrating on the database

structures and constraints during conceptual database design. The design of appli-

cation programs is typically covered in software engineering courses. We present the

modeling concepts of the Entity-Relationship (ER) model, which is a popular

high-level conceptual data model. This model and its variations are frequently used

for the conceptual design of database applications, and many database design tools

employ its concepts. We describe the basic data-structuring concepts and con-

straints of the ER model and discuss their use in the design of conceptual schemas

for database applications. We also present the diagrammatic notation associated

with the ER model, known as ER diagrams.

199

200

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

Object modeling methodologies such as the Unified Modeling Language (UML)

are becoming increasingly popular in both database and software design. These

methodologies go beyond database design to specify detailed design of software

modules and their interactions using various types of diagrams. An important part

of these methodologies—namely, class diagrams¹—are similar in many ways to the

ER diagrams. In class diagrams, operations on objects are specified, in addition to

specifying the database schema structure. Operations can be used to specify the

functional requirements during database design, as we will discuss in Section 7.1. We

present some of the UML notation and concepts for class diagrams that are partic-

ularly relevant to database design in Section 7.8, and briefly compare these to ER

notation and concepts. Additional UML notation and concepts are presented in

Section 8.6 and in Chapter 10.

This chapter is organized as follows: Section 7.1 discusses the role of high-level con-

ceptual data models in database design. We introduce the requirements for a sample

database application in Section 7.2 to illustrate the use of concepts from the ER

model. This sample database is also used throughout the book. In Section 7.3 we

present the concepts of entities and attributes, and we gradually introduce the dia-

grammatic technique for displaying an ER schema. In Section 7.4 we introduce the

concepts of binary relationships and their roles and structural constraints. Section

7.5 introduces weak entity types. Section 7.6 shows how a schema design is refined

to include relationships. Section 7.7 reviews the notation for ER diagrams, summa-

rizes the issues and common pitfalls that occur in schema design, and discusses how

to choose the names for database schema constructs. Section 7.8 introduces some

UML class diagram concepts, compares them to ER model concepts, and applies

them to the same database example. Section 7.9 discusses more complex types of

relationships. Section 7.10 summarizes the chapter.

The material in Sections 7.8 and 7.9 may be excluded from an introductory course. If

a more thorough coverage of data modeling concepts and conceptual database design

is desired, the reader should continue to Chapter 8, where we describe extensions to

the ER model that lead to the Enhanced-ER (EER) model, which includes concepts

such as specialization, generalization, inheritance, and union types (categories). We

also introduce some additional UML concepts and notation in Chapter 8.

7.1 Using High-Level Conceptual Data Models

for Database Design

Figure 7.1 shows a simplified overview of the database design process. The first step

shown is requirements collection and analysis. During this step, the database

designers interview prospective database users to understand and document their

data requirements. The result of this step is a concisely written set of users’ require-

ments. These requirements should be specified in as detailed and complete a form

as possible. In parallel with specifying the data requirements, it is useful to specify

¹A class is similar to an entity type in many ways.

7.1

Using High-Level Conceptual Data Models for Database Design

201

Miniworld

REQUIREMENTS

COLLECTION AND

ANALYSIS

Functional Requirements

Data Requirements

FUNCTIONAL ANALYSIS

CONCEPTUAL DESIGN

High-Level Transaction

Conceptual Schema

Specification

(In a high-level data model)

DBMS-independent

LOGICAL DESIGN

DBMS-specific

(DATA MODEL MAPPING)

Logical (Conceptual) Schema

APPLICATION PROGRAM

(In the data model of a specific DBMS)

DESIGN

PHYSICAL DESIGN

TRANSACTION

Internal Schema

IMPLEMENTATION

Figure 7.1

A simplified diagram to illustrate the

Application Programs

main phases of database design.

the known functional requirements of the application. These consist of the user-

defined operations (or transactions) that will be applied to the database, including

both retrievals and updates. In software design, it is common to use data flow dia-

grams, sequence diagrams, scenarios, and other techniques to specify functional

requirements. We will not discuss any of these techniques here; they are usually

described in detail in software engineering texts. We give an overview of some of

these techniques in Chapter 10.

Once the requirements have been collected and analyzed, the next step is to create a

conceptual schema for the database, using a high-level conceptual data model. This

step is called conceptual design. The conceptual schema is a concise description of

202

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

the data requirements of the users and includes detailed descriptions of the entity

types, relationships, and constraints; these are expressed using the concepts pro-

vided by the high-level data model. Because these concepts do not include imple-

mentation details, they are usually easier to understand and can be used to

communicate with nontechnical users. The high-level conceptual schema can also

be used as a reference to ensure that all users’ data requirements are met and that the

requirements do not conflict. This approach enables database designers to concen-

trate on specifying the properties of the data, without being concerned with storage

and implementation details. This makes it is easier to create a good conceptual data-

base design.

During or after the conceptual schema design, the basic data model operations can

be used to specify the high-level user queries and operations identified during func-

tional analysis. This also serves to confirm that the conceptual schema meets all the

identified functional requirements. Modifications to the conceptual schema can be

introduced if some functional requirements cannot be specified using the initial

schema.

The next step in database design is the actual implementation of the database, using

a commercial DBMS. Most current commercial DBMSs use an implementation

data model—such as the relational or the object-relational database model—so the

conceptual schema is transformed from the high-level data model into the imple-

mentation data model. This step is called logical design or data model mapping; its

result is a database schema in the implementation data model of the DBMS. Data

model mapping is often automated or semiautomated within the database design

tools.

The last step is the physical design phase, during which the internal storage struc-

tures, file organizations, indexes, access paths, and physical design parameters for

the database files are specified. In parallel with these activities, application programs

are designed and implemented as database transactions corresponding to the high-

level transaction specifications. We discuss the database design process in more

detail in Chapter 10.

We present only the basic ER model concepts for conceptual schema design in this

chapter. Additional modeling concepts are discussed in Chapter 8, when we intro-

duce the EER model.

7.2 A Sample Database Application

In this section we describe a sample database application, called COMPANY, which

serves to illustrate the basic ER model concepts and their use in schema design. We

list the data requirements for the database here, and then create its conceptual

schema step-by-step as we introduce the modeling concepts of the ER model. The

COMPANY database keeps track of a company’s employees, departments, and proj-

ects. Suppose that after the requirements collection and analysis phase, the database

designers provide the following description of the miniworld—the part of the com-

pany that will be represented in the database.

7.3

Entity Types, Entity Sets, Attributes, and Keys

203

■ The company is organized into departments. Each department has a unique

name, a unique number, and a particular employee who manages the

department. We keep track of the start date when that employee began man-

aging the department. A department may have several locations.

■ A department controls a number of projects, each of which has a unique

name, a unique number, and a single location.

■ We store each employee’s name, Social Security number,² address, salary, sex

(gender), and birth date. An employee is assigned to one department, but

may work on several projects, which are not necessarily controlled by the

same department. We keep track of the current number of hours per week

that an employee works on each project. We also keep track of the direct

supervisor of each employee (who is another employee).

■ We want to keep track of the dependents of each employee for insurance

purposes. We keep each dependent’s first name, sex, birth date, and relation-

ship to the employee.

Figure 7.2 shows how the schema for this database application can be displayed by

means of the graphical notation known as ER diagrams. This figure will be

explained gradually as the ER model concepts are presented. We describe the step-

by-step process of deriving this schema from the stated requirements—and explain

the ER diagrammatic notation—as we introduce the ER model concepts.

7.3 Entity Types, Entity Sets, Attributes,

and Keys

The ER model describes data as entities, relationships, and attributes. In Section 7.3.1

we introduce the concepts of entities and their attributes. We discuss entity types

and key attributes in Section 7.3.2. Then, in Section 7.3.3, we specify the initial con-

ceptual design of the entity types for the COMPANY database. Relationships are

described in Section 7.4.

7.3.1

Entities and Attributes

Entities and Their Attributes. The basic object that the ER model represents is

an entity, which is a thing in the real world with an independent existence. An entity

may be an object with a physical existence (for example, a particular person, car,

house, or employee) or it may be an object with a conceptual existence (for instance,

a company, a job, or a university course). Each entity has attributes—the particular

properties that describe it. For example, an EMPLOYEE entity may be described by

the employee’s name, age, address, salary, and job. A particular entity will have a

²The Social Security number, or SSN, is a unique nine-digit identifier assigned to each individual in the

United States to keep track of his or her employment, benefits, and taxes. Other countries may have

similar identification schemes, such as personal identification card numbers.

204

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

Fname

Minit

Lname

Bdate

Name

Address

Salary

Ssn

Sex

Locations

WORKS_FOR

Name

Number

EMPLOYEE

Start_date

Number_of_employees

DEPARTMENT

MANAGES

CONTROLS

Hours

WORKS_ON

PROJECT

Supervisor

Supervisee

Name

SUPERVISION

Location

Number

DEPENDENTS_OF

DEPENDENT

Name

Sex

Birth_date

Relationship

Figure 7.2

An ER schema diagram for the COMPANY database. The diagrammatic notation

is introduced gradually throughout this chapter and is summarized in Figure 7.14.

value for each of its attributes. The attribute values that describe each entity become

a major part of the data stored in the database.

Figure 7.3 shows two entities and the values of their attributes. The EMPLOYEE

entity e₁ has four attributes: Name, Address, Age, and Home_phone; their values are

‘John Smith,’ ‘2311 Kirby, Houston, Texas 77001’, ‘55’, and ‘713-749-2630’, respec-

tively. The COMPANY entity c₁ has three attributes: Name, Headquarters, and

President; their values are ‘Sunco Oil’, ‘Houston’, and ‘John Smith’, respectively.

Several types of attributes occur in the ER model: simple versus composite, single-

valued versus multivalued, and stored versus derived. First we define these attribute

7.3

Entity Types, Entity Sets, Attributes, and Keys

205

Name = Sunco Oil

Name = John Smith

Address = 2311 Kirby

Houston, Texas 77001

e₁

c₁

Headquarters = Houston

Figure 7.3

Age = 55

Two entities,

EMPLOYEE e₁, and

COMPANY c₁, and

Home_phone = 713-749-2630

President = John Smith

their attributes.

types and illustrate their use via examples. Then we discuss the concept of a NULL

value for an attribute.

Composite versus Simple (Atomic) Attributes. Composite attributes can be

divided into smaller subparts, which represent more basic attributes with indepen-

dent meanings. For example, the Address attribute of the EMPLOYEE entity shown

in Figure 7.3 can be subdivided into Street_address, City, State, and Zip,³ with the

values ‘2311 Kirby’, ‘Houston’, ‘Texas’, and ‘77001.’ Attributes that are not divisible

are called simple or atomic attributes. Composite attributes can form a hierarchy;

for example, Street_address can be further subdivided into three simple component

attributes: Number, Street, and Apartment_number, as shown in Figure 7.4. The value

of a composite attribute is the concatenation of the values of its component simple

attributes.

Composite attributes are useful to model situations in which a user sometimes

refers to the composite attribute as a unit but at other times refers specifically to its

components. If the composite attribute is referenced only as a whole, there is no

Address

Figure 7.4

A hierarchy of composite

attributes.

Street_address

City

State

Zip

Number

Street

Apartment_number

³Zip Code is the name used in the United States for a five-digit postal code, such as 76019, which can

be extended to nine digits, such as 76019-0015. We use the five-digit Zip in our examples.

206

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

need to subdivide it into component attributes. For example, if there is no need to

refer to the individual components of an address (Zip Code, street, and so on), then

the whole address can be designated as a simple attribute.

Single-Valued versus Multivalued Attributes. Most attributes have a single

value for a particular entity; such attributes are called single-valued. For example,

Age is a single-valued attribute of a person. In some cases an attribute can have a set

of values for the same entity—for instance, a Colors attribute for a car, or a

College_degrees attribute for a person. Cars with one color have a single value,

whereas two-tone cars have two color values. Similarly, one person may not have a

college degree, another person may have one, and a third person may have two or

more degrees; therefore, different people can have different numbers of values for

the College_degrees attribute. Such attributes are called multivalued. A multivalued

attribute may have lower and upper bounds to constrain the number of values

allowed for each individual entity. For example, the Colors attribute of a car may be

restricted to have between one and three values, if we assume that a car can have

three colors at most.

Stored versus Derived Attributes. In some cases, two (or more) attribute val-

ues are related—for example, the Age and Birth_date attributes of a person. For a

particular person entity, the value of Age can be determined from the current

(today’s) date and the value of that person’s Birth_date. The Age attribute is hence

called a derived attribute and is said to be derivable from the Birth_date attribute,

which is called a stored attribute. Some attribute values can be derived from

related entities; for example, an attribute Number_of_employees of a DEPARTMENT

entity can be derived by counting the number of employees related to (working

for) that department.

NULL Values. In some cases, a particular entity may not have an applicable value

for an attribute. For example, the Apartment_number attribute of an address applies

only to addresses that are in apartment buildings and not to other types of resi-

dences, such as single-family homes. Similarly, a College_degrees attribute applies

only to people with college degrees. For such situations, a special value called NULL

is created. An address of a single-family home would have NULL for its

Apartment_number attribute, and a person with no college degree would have NULL

for College_degrees. NULL can also be used if we do not know the value of an attrib-

ute for a particular entity—for example, if we do not know the home phone num-

ber of ‘John Smith’ in Figure 7.3. The meaning of the former type of NULL is not

applicable, whereas the meaning of the latter is unknown. The unknown category of

NULL can be further classified into two cases. The first case arises when it is known

that the attribute value exists but is missing—for instance, if the Height attribute of a

person is listed as NULL. The second case arises when it is not known whether the

attribute value exists—for example, if the Home_phone attribute of a person is NULL.

Complex Attributes. Notice that, in general, composite and multivalued attrib-

utes can be nested arbitrarily. We can represent arbitrary nesting by grouping com-

7.3

Entity Types, Entity Sets, Attributes, and Keys

207

ponents of a composite attribute between parentheses () and separating the compo-

nents with commas, and by displaying multivalued attributes between braces { }.

Such attributes are called complex attributes. For example, if a person can have

more than one residence and each residence can have a single address and multiple

phones, an attribute Address_phone for a person can be specified as shown in Figure

7.5.⁴ Both Phone and Address are themselves composite attributes.

7.3.2

Entity Types, Entity Sets, Keys, and Value Sets

Entity Types and Entity Sets. A database usually contains groups of entities that

are similar. For example, a company employing hundreds of employees may want to

store similar information concerning each of the employees. These employee entities

share the same attributes, but each entity has its own value(s) for each attribute. An

entity type defines a collection (or set) of entities that have the same attributes. Each

entity type in the database is described by its name and attributes. Figure 7.6 shows

two entity types: EMPLOYEE and COMPANY, and a list of some of the attributes for

{Address_phone( {Phone(Area_code,Phone_number)},Address(Street_address

Figure 7.5

(Number,Street,Apartment_number),City,State,Zip) )}

A complex attribute:

Address_phone.

Entity Type Name:

EMPLOYEE

COMPANY

Figure 7.6

Two entity types,

Name, Age, Salary

Name, Headquarters, President

EMPLOYEE and

COMPANY, and some

e₁

c₁

member entities of

each.

(John Smith, 55, 80k)

(Sunco Oil, Houston, John Smith)

e₂

c₂

Entity Set:

(Fred Brown, 40, 30K)

(Fast Computer, Dallas, Bob King)

(Extension)

e₃

(Judy Clark, 25, 20K)

⁴For those familiar with XML, we should note that complex attributes are similar to complex elements in

XML (see Chapter 12).

208

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

each. A few individual entities of each type are also illustrated, along with the values

of their attributes. The collection of all entities of a particular entity type in the data-

base at any point in time is called an entity set; the entity set is usually referred to

using the same name as the entity type. For example, EMPLOYEE refers to both a type

of entity as well as the current set of all employee entities in the database.

An entity type is represented in ER diagrams⁵ (see Figure 7.2) as a rectangular box

enclosing the entity type name. Attribute names are enclosed in ovals and are

attached to their entity type by straight lines. Composite attributes are attached to

their component attributes by straight lines. Multivalued attributes are displayed in

double ovals. Figure 7.7(a) shows a CAR entity type in this notation.

An entity type describes the schema or intension for a set of entities that share the

same structure. The collection of entities of a particular entity type is grouped into

an entity set, which is also called the extension of the entity type.

Key Attributes of an Entity Type. An important constraint on the entities of an

entity type is the key or uniqueness constraint on attributes. An entity type usually

Figure 7.7

(a)

State

Number

The CAR entity type

with two key attributes,

Registration

Vehicle_id

Registration and

Vehicle_id. (a) ER

diagram notation. (b)

Year

CAR

Model

Entity set with three

entities.

Color

Make

(b)

CAR

Registration (Number, State), Vehicle_id, Make, Model, Year, {Color}

CAR₁

((ABC 123, TEXAS), TK629, Ford Mustang, convertible, 2004 {red, black})

CAR₂

((ABC 123, NEW YORK), WP9872, Nissan Maxima, 4-door, 2005, {blue})

CAR₃

((VSY 720, TEXAS), TD729, Chrysler LeBaron, 4-door, 2002, {white, blue})

⁵We use a notation for ER diagrams that is close to the original proposed notation (Chen 1976). Many

other notations are in use; we illustrate some of them later in this chapter when we present UML class

diagrams and in Appendix A.

7.3

Entity Types, Entity Sets, Attributes, and Keys

209

has one or more attributes whose values are distinct for each individual entity in the

entity set. Such an attribute is called a key attribute, and its values can be used to

identify each entity uniquely. For example, the Name attribute is a key of the

COMPANY entity type in Figure 7.6 because no two companies are allowed to have

the same name. For the PERSON entity type, a typical key attribute is Ssn (Social

Security number). Sometimes several attributes together form a key, meaning that

the combination of the attribute values must be distinct for each entity. If a set of

attributes possesses this property, the proper way to represent this in the ER model

that we describe here is to define a composite attribute and designate it as a key

attribute of the entity type. Notice that such a composite key must be minimal; that

is, all component attributes must be included in the composite attribute to have the

uniqueness property. Superfluous attributes must not be included in a key. In ER

diagrammatic notation, each key attribute has its name underlined inside the oval,

as illustrated in Figure 7.7(a).

Specifying that an attribute is a key of an entity type means that the preceding

uniqueness property must hold for every entity set of the entity type. Hence, it is a

constraint that prohibits any two entities from having the same value for the key

attribute at the same time. It is not the property of a particular entity set; rather, it is

a constraint on any entity set of the entity type at any point in time. This key con-

straint (and other constraints we discuss later) is derived from the constraints of the

miniworld that the database represents.

Some entity types have more than one key attribute. For example, each of the

Vehicle_id and Registration attributes of the entity type CAR (Figure 7.7) is a key in its

own right. The Registration attribute is an example of a composite key formed from

two simple component attributes, State and Number, neither of which is a key on its

own. An entity type may also have no key, in which case it is called a weak entity type

(see Section 7.5).

In our diagrammatic notation, if two attributes are underlined separately, then each

is a key on its own. Unlike the relational model (see Section 3.2.2), there is no con-

cept of primary key in the ER model that we present here; the primary key will be

chosen during mapping to a relational schema (see Chapter 9).

Value Sets (Domains) of Attributes. Each simple attribute of an entity type is

associated with a value set (or domain of values), which specifies the set of values

that may be assigned to that attribute for each individual entity. In Figure 7.6, if the

range of ages allowed for employees is between 16 and 70, we can specify the value

set of the Age attribute of EMPLOYEE to be the set of integer numbers between 16

and 70. Similarly, we can specify the value set for the Name attribute to be the set of

strings of alphabetic characters separated by blank characters, and so on. Value sets

are not displayed in ER diagrams, and are typically specified using the basic data

types available in most programming languages, such as integer, string, Boolean,

float, enumerated type, subrange, and so on. Additional data types to represent

common database types such as date, time, and other concepts are also employed.

210

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

Mathematically, an attribute A of entity set E whose value set is V can be defined as

a function from E to the power set⁶ P(V ) of V:

A : E → P(V)

We refer to the value of attribute A for entity e as A(e). The previous definition cov-

ers both single-valued and multivalued attributes, as well as NULLs. A NULL value is

represented by the empty set. For single-valued attributes, A(e) is restricted to being

a singleton set for each entity e in E, whereas there is no restriction on multivalued

attributes.⁷ For a composite attribute A, the value set V is the power set of the

Cartesian product of P(V₁), P(V₂), …, P(V_n), where V₁, V₂, …, V_n are the value sets

of the simple component attributes that form A:

V = P (P(V₁) × P(V₂) × … × P(V_n))

The value set provides all possible values. Usually only a small number of these val-

ues exist in the database at a particular time. Those values represent the data from

the current state of the miniworld. They correspond to the data as it actually exists

in the miniworld.

7.3.3

Initial Conceptual Design of the COMPANY Database

We caow define the entity types for the COMPANY database, based on the

requirements described in Section 7.2. After defining several entity types and their

attributes here, we refine our design in Section 7.4 after we introduce the concept of

a relationship. According to the requirements listed in Section 7.2, we can identify

four entity types—one corresponding to each of the four items in the specification

(see Figure 7.8):

1. An entity type DEPARTMENT with attributes Name, Number, Locations,

Manager, and Manager_start_date. Locations is the only multivalued attribute.

We can specify that both Name and Number are (separate) key attributes

because each was specified to be unique.

2. An entity type PROJECT with attributes Name, Number, Location, and

Controlling_department. Both Name and Number are (separate) key attributes.

3. An entity type EMPLOYEE with attributes Name, Ssn, Sex, Address, Salary,

Birth_date, Department, and Supervisor. Both Name and Address may be com-

posite attributes; however, this was not specified in the requirements. We

must go back to the users to see if any of them will refer to the individual

components of Name—First_name, Middle_initial, Last_name—or of Address.

4. An entity type DEPENDENT with attributes Employee, Dependent_name, Sex,

Birth_date, and Relationship (to the employee).

⁶The power set P (V ) of a set V is the set of all subsets of V.

⁷A singleton set is a set with only one element (value).

7.3

Entity Types, Entity Sets, Attributes, and Keys

211

Name

Number

Locations

DEPARTMENT

Manager

Manager_start_date

Name

Number

Location

PROJECT

Controlling_department

Fname

Minit

Lname

Project

Hours

Sex

Name

Works_on

Ssn

Salary

Department

Supervisor

EMPLOYEE

Birth_date

Address

Figure 7.8

Birth_date

Sex

Employee

Preliminary design of entity types

for the COMPANY database.

Relationship

Dependent_name

Some of the shown attributes will

DEPENDENT

be refined into relationships.

So far, we have not represented the fact that an employee can work on several proj-

ects, nor have we represented the number of hours per week an employee works on

each project. This characteristic is listed as part of the third requirement in Section

7.2, and it can be represented by a multivalued composite attribute of EMPLOYEE

called Works_on with the simple components (Project, Hours). Alternatively, it can be

represented as a multivalued composite attribute of PROJECT called Workers with

the simple components (Employee, Hours). We choose the first alternative in Figure

7.8, which shows each of the entity types just described. The Name attribute of

EMPLOYEE is shown as a composite attribute, presumably after consultation with

the users.

212

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

7.4 Relationship Types, Relationship Sets,

Roles, and Structural Constraints

In Figure 7.8 there are several implicit relationships among the various entity types.

In fact, whenever an attribute of one entity type refers to another entity type, some

relationship exists. For example, the attribute Manager of DEPARTMENT refers to an

employee who manages the department; the attribute Controlling_department of

PROJECT refers to the department that controls the project; the attribute Supervisor

of EMPLOYEE refers to another employee (the one who supervises this employee);

the attribute Department of EMPLOYEE refers to the department for which the

employee works; and so on. In the ER model, these references should not be repre-

sented as attributes but as relationships, which are discussed in this section. The

COMPANY database schema will be refined in Section 7.6 to represent relationships

explicitly. In the initial design of entity types, relationships are typically captured in

the form of attributes. As the design is refined, these attributes get converted into

relationships between entity types.

This section is organized as follows: Section 7.4.1 introduces the concepts of rela-

tionship types, relationship sets, and relationship instances. We define the concepts

of relationship degree, role names, and recursive relationships in Section 7.4.2, and

then we discuss structural constraints on relationships—such as cardinality ratios

and existence dependencies—in Section 7.4.3. Section 7.4.4 shows how relationship

types can also have attributes.

7.4.1

Relationship Types, Sets, and Instances

A relationship type R among n entity types E₁, E₂, …, E_n defines a set of associa-

tions—or a relationship set—among entities from these entity types. As for the

case of entity types and entity sets, a relationship type and its corresponding rela-

tionship set are customarily referred to by the same name, R. Mathematically, the

relationship set R is a set of relationship instances r_i, where each r_i associates n

individual entities (e₁, e₂, …, e_n), and each entity e_j in r_i is a member of entity set E_j,

1 f j f n. Hence, a relationship set is a mathematical relation on E₁, E₂, …, E_n; alter-

natively, it can be defined as a subset of the Cartesian product of the entity sets E₁ ×

E₂ × … × E_n. Each of the entity types E₁, E₂, …, E_n is said to participate in the rela-

tionship type R; similarly, each of the individual entities e₁, e₂, …, e_n is said to

participate in the relationship instance r_i = (e₁, e₂, …, e_n).

Informally, each relationship instance r_i in R is an association of entities, where the

association includes exactly one entity from each participating entity type. Each

such relationship instance r_i represents the fact that the entities participating in r_i

are related in some way in the corresponding miniworld situation. For example,

consider a relationship type WORKS_FOR between the two entity types EMPLOYEE

and DEPARTMENT, which associates each employee with the department for which

the employee works in the corresponding entity set. Each relationship instance in

the relationship set WORKS_FOR associates one EMPLOYEE entity and one

DEPARTMENT entity. Figure 7.9 illustrates this example, where each relationship

7.4

Relationship Types, Relationship Sets, Roles, and Structural Constraints

213

EMPLOYEE

WORKS_FOR

DEPARTMENT

r₁

e₁

d₁

r₂

e₂

e₃

r₃

d₃

e₄

r₄

e₅

r₅

e₆

e₇

Figure 7.9

r₆

Some instances in the

WORKS_FOR relationship

r₇

set, which represents a

relationship type

WORKS_FOR between

EMPLOYEE and

DEPARTMENT.

instance r_i is shown connected to the EMPLOYEE and DEPARTMENT entities that

participate in r_i. In the miniworld represented by Figure 7.9, employees e₁, e₃, and e₆

work for department d₁; employees e₂ and e₄ work for department d₂; and employ-

ees e₅ and e₇ work for department d₃.

In ER diagrams, relationship types are displayed as diamond-shaped boxes, which

are connected by straight lines to the rectangular boxes representing the participat-

ing entity types. The relationship name is displayed in the diamond-shaped box (see

Figure 7.2).

7.4.2

Relationship Degree, Role Names,

and Recursive Relationships

Degree of a Relationship Type. The degree of a relationship type is the number

of participating entity types. Hence, the WORKS_FOR relationship is of degree two.

A relationship type of degree two is called binary, and one of degree three is called

ternary. An example of a ternary relationship is SUPPLY, shown in Figure 7.10,

where each relationship instance r_i associates three entities—a supplier s, a part p,

and a project j—whenever s supplies part p to project j. Relationships can generally

be of any degree, but the ones most common are binary relationships. Higher-

degree relationships are generally more complex than binary relationships; we char-

acterize them further in Section 7.9.

214

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

SUPPLIER

SUPPLY

PROJECT

s₁

r₁

s₂

r₂

r₃

r₄

PART

r₅

p₁

p₂

r₆

p₃

r₇

Figure 7.10

Some relationship instances in

the SUPPLY ternary relationship

set.

Relationships as Attributes. It is sometimes convenient to think of a binary

relationship type in terms of attributes, as we discussed in Section 7.3.3. Consider

the WORKS_FOR relationship type in Figure 7.9. One can think of an attribute

called Department of the EMPLOYEE entity type, where the value of Department for

each EMPLOYEE entity is (a reference to) the DEPARTMENT entity for which that

employee works. Hence, the value set for this Department attribute is the set of all

DEPARTMENT entities, which is the DEPARTMENT entity set. This is what we did in

Figure 7.8 when we specified the initial design of the entity type EMPLOYEE for the

COMPANY database. However, when we think of a binary relationship as an attrib-

ute, we always have two options. In this example, the alternative is to think of a mul-

tivalued attribute Employee of the entity type DEPARTMENT whose values for each

DEPARTMENT entity is the set of EMPLOYEE entities who work for that department.

The value set of this Employee attribute is the power set of the EMPLOYEE entity set.

Either of these two attributes—Department of EMPLOYEE or Employee of

DEPARTMENT—can represent the WORKS_FOR relationship type. If both are repre-

sented, they are constrained to be inverses of each other.⁸

⁸This concept of representing relationship types as attributes is used in a class of data models called

functional data models. In object databases (see Chapter 11), relationships can be represented by ref-

erence attributes, either in one direction or in both directions as inverses. In relational databases (see

Chapter 3), foreign keys are a type of reference attribute used to represent relationships.

7.4

Relationship Types, Relationship Sets, Roles, and Structural Constraints

215

Role Names and Recursive Relationships. Each entity type that participates

in a relationship type plays a particular role in the relationship. The role name sig-

nifies the role that a participating entity from the entity type plays in each relation-

ship instance, and helps to explain what the relationship means. For example, in the

WORKS_FOR relationship type, EMPLOYEE plays the role of employee or worker and

DEPARTMENT plays the role of department or employer.

Role names are not technically necessary in relationship types where all the partici-

pating entity types are distinct, since each participating entity type name can be

used as the role name. However, in some cases the same entity type participates

more than once in a relationship type in different roles. In such cases the role name

becomes essential for distinguishing the meaning of the role that each participating

entity plays. Such relationship types are called recursive relationships. Figure 7.11

shows an example. The SUPERVISION relationship type relates an employee to a

supervisor, where both employee and supervisor entities are members of the same

EMPLOYEE entity set. Hence, the EMPLOYEE entity type participates twice in

SUPERVISION: once in the role of supervisor (or boss), and once in the role of

supervisee (or subordinate). Each relationship instance r_i in SUPERVISION associates

two employee entities e_j and e_k, one of which plays the role of supervisor and the

other the role of supervisee. In Figure 7.11, the lines marked ‘1’ represent the super-

visor role, and those marked ‘2’ represent the supervisee role; hence, e₁ supervises e₂

and e₃, e₄ supervises e₆ and e₇, and e₅ supervises e₁ and e₄. In this example, each rela-

tionship instance must be connected with two lines, one marked with ‘1’ (supervi-

sor) and the other with ‘2’ (supervisee).

EMPLOYEE

SUPERVISION

Figure 7.11

A recursive relationship

SUPERVISION between

r₁

EMPLOYEE in the

e₁

supervisor role (1) and

r₂

EMPLOYEE in the

e₂

subordinate role (2).

e₃

r₃

e₄

r₄

e₅

e₆

r₅

e₇

r₆

216

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

7.4.3

Constraints on Binary Relationship Types

Relationship types usually have certain constraints that limit the possible combina-

tions of entities that may participate in the corresponding relationship set. These

constraints are determined from the miniworld situation that the relationships rep-

resent. For example, in Figure 7.9, if the company has a rule that each employee

must work for exactly one department, then we would like to describe this con-

straint in the schema. We can distinguish two main types of binary relationship

constraints: cardinality ratio and participation.

Cardinality Ratios for Binary Relationships. The cardinality ratio for a binary

relationship specifies the maximum number of relationship instances that an entity

can participate in. For example, in the WORKS_FOR binary relationship type,

DEPARTMENT:EMPLOYEE is of cardinality ratio 1:N, meaning that each department

can be related to (that is, employs) any number of employees,⁹ but an employee can

be related to (work for) only one department. This means that for this particular

relationship WORKS_FOR, a particular department entity can be related to any

number of employees (N indicates there is no maximum number). On the other

hand, an employee can be related to a maximum of one department. The possible

cardinality ratios for binary relationship types are 1:1, 1:N, N:1, and M:N.

An example of a 1:1 binary relationship is MANAGES (Figure 7.12), which relates a

department entity to the employee who manages that department. This represents

the miniworld constraints that—at any point in time—an employee can manage

one department only and a department can have one manager only. The relation-

ship type WORKS_ON (Figure 7.13) is of cardinality ratio M:N, because the mini-

Figure 7.12

EMPLOYEE

MANAGES

DEPARTMENT

A 1:1 relationship,

MANAGES.

e₁

e₂

r₁

e₃

r₂

e₄

d₂

e₅

r₃

e₆

e₇

⁹N stands for any number of related entities (zero or more).

7.4

Relationship Types, Relationship Sets, Roles, and Structural Constraints

217

EMPLOYEE

WORKS_ON

PROJECT

e₁

r₁

p₁

e₂

p₂

e₃

e₄

r₃

p₃

r₄

p₄

r₅

r₆

r₇

Figure 7.13

An M:N relationship,

WORKS_ON.

world rule is that an employee can work on several projects and a project can have

several employees.

Cardinality ratios for binary relationships are represented on ER diagrams by dis-

playing 1, M, and N on the diamonds as shown in Figure 7.2. Notice that in this

notation, we can either specify no maximum (N) or a maximum of one (1) on par-

ticipation. An alternative notation (see Section 7.7.4) allows the designer to specify

a specific maximum number on participation, such as 4 or 5.

Participation Constraints and Existence Dependencies. The participation

constraint specifies whether the existence of an entity depends on its being related

to another entity via the relationship type. This constraint specifies the minimum

number of relationship instances that each entity can participate in, and is some-

times called the minimum cardinality constraint. There are two types of participa-

tion constraints—total and partial—that we illustrate by example. If a company

policy states that every employee must work for a department, then an employee

entity can exist only if it participates in at least one WORKS_FOR relationship

instance (Figure 7.9). Thus, the participation of EMPLOYEE in WORKS_FOR is

called total participation, meaning that every entity in the total set of employee

entities must be related to a department entity via WORKS_FOR. Total participation

is also called existence dependency. In Figure 7.12 we do not expect every employee

to manage a department, so the participation of EMPLOYEE in the MANAGES rela-

tionship type is partial, meaning that some or part of the set of employee entities are

related to some department entity via MANAGES, but not necessarily all. We will

218

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

refer to the cardinality ratio and participation constraints, taken together, as the

structural constraints of a relationship type.

In ER diagrams, total participation (or existence dependency) is displayed as a

double line connecting the participating entity type to the relationship, whereas par-

tial participation is represented by a single line (see Figure 7.2). Notice that in this

notation, we can either specify no minimum (partial participation) or a minimum

of one (total participation). The alternative notation (see Section 7.7.4) allows the

designer to specify a specific minimum number on participation in the relationship,

such as 4 or 5.

We will discuss constraints on higher-degree relationships in Section 7.9.

7.4.4

Attributes of Relationship Types

Relationship types can also have attributes, similar to those of entity types. For

example, to record the number of hours per week that an employee works on a par-

ticular project, we can include an attribute Hours for the WORKS_ON relationship

type in Figure 7.13. Another example is to include the date on which a manager

started managing a department via an attribute Start_date for the MANAGES rela-

tionship type in Figure 7.12.

Notice that attributes of 1:1 or 1:N relationship types can be migrated to one of the

participating entity types. For example, the Start_date attribute for the MANAGES

relationship can be an attribute of either EMPLOYEE or DEPARTMENT, although

conceptually it belongs to MANAGES. This is because MANAGES is a 1:1 relation-

ship, so every department or employee entity participates in at most one relationship

instance. Hence, the value of the Start_date attribute can be determined separately,

either by the participating department entity or by the participating employee

(manager) entity.

For a 1:N relationship type, a relationship attribute can be migrated only to the

entity type on the N-side of the relationship. For example, in Figure 7.9, if the

WORKS_FOR relationship also has an attribute Start_date that indicates when an

employee started working for a department, this attribute can be included as an

attribute of EMPLOYEE. This is because each employee works for only one depart-

ment, and hence participates in at most one relationship instance in WORKS_FOR.

In both 1:1 and 1:N relationship types, the decision where to place a relationship

attribute—as a relationship type attribute or as an attribute of a participating entity

type—is determined subjectively by the schema designer.

For M:N relationship types, some attributes may be determined by the combination

of participating entities in a relationship instance, not by any single entity. Such

attributes must be specified as relationship attributes. An example is the Hours attrib-

ute of the M:N relationship WORKS_ON (Figure 7.13); the number of hours per

week an employee currently works on a project is determined by an employee-

project combination and not separately by either entity.

7.5

Weak Entity Types

219

7.5 Weak Entity Types

Entity types that do not have key attributes of their own are called weak entity

types. In contrast, regular entity types that do have a key attribute—which include

all the examples discussed so far—are called strong entity types. Entities belonging

to a weak entity type are identified by being related to specific entities from another

entity type in combination with one of their attribute values. We call this other

entity type the identifying or owner entity type,¹⁰ and we call the relationship type

that relates a weak entity type to its owner the identifying relationship of the weak

entity type.¹¹ A weak entity type always has a total participation constraint (existence

dependency) with respect to its identifying relationship because a weak entity can-

not be identified without an owner entity. However, not every existence dependency

results in a weak entity type. For example, a DRIVER_LICENSE entity cannot exist

unless it is related to a PERSON entity, even though it has its own key

(License_number) and hence is not a weak entity.

Consider the entity type DEPENDENT, related to EMPLOYEE, which is used to keep

track of the dependents of each employee via a 1:N relationship (Figure 7.2). In our

example, the attributes of DEPENDENT are Name (the first name of the dependent),

Birth_date, Sex, and Relationship (to the employee). Two dependents of two distinct

employees may, by chance, have the same values for Name, Birth_date, Sex, and

Relationship, but they are still distinct entities. They are identified as distinct entities

only after determining the particular employee entity to which each dependent is

related. Each employee entity is said to own the dependent entities that are related

to it.

A weak entity type normally has a partial key, which is the attribute that can

uniquely identify weak entities that are related to the same owner entity.¹² In our

example, if we assume that no two dependents of the same employee ever have the

same first name, the attribute Name of DEPENDENT is the partial key. In the worst

case, a composite attribute of all the weak entity’s attributes will be the partial key.

In ER diagrams, both a weak entity type and its identifying relationship are distin-

guished by surrounding their boxes and diamonds with double lines (see Figure

7.2). The partial key attribute is underlined with a dashed or dotted line.

Weak entity types can sometimes be represented as complex (composite, multival-

ued) attributes. In the preceding example, we could specify a multivalued attribute

Dependents for EMPLOYEE, which is a composite attribute with component attrib-

utes Name, Birth_date, Sex, and Relationship. The choice of which representation to

use is made by the database designer. One criterion that may be used is to choose the

¹⁰The identifying entity type is also sometimes called the parent entity type or the dominant entity

type.

¹¹The weak entity type is also sometimes called the child entity type or the subordinate entity type.

¹²The partial key is sometimes called the discriminator.

220

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

weak entity type representation if there are many attributes. If the weak entity par-

ticipates independently in relationship types other than its identifying relationship

type, then it should not be modeled as a complex attribute.

In general, any number of levels of weak entity types can be defined; an owner

entity type may itself be a weak entity type. In addition, a weak entity type may have

more than one identifying entity type and an identifying relationship type of degree

higher than two, as we illustrate in Section 7.9.

7.6 Refining the ER Design for the COMPANY

Database

We caow refine the database design in Figure 7.8 by changing the attributes that

represent relationships into relationship types. The cardinality ratio and participa-

tion constraint of each relationship type are determined from the requirements

listed in Section 7.2. If some cardinality ratio or dependency cannot be determined

from the requirements, the users must be questioned further to determine these

structural constraints.

In our example, we specify the following relationship types:

■

MANAGES, a 1:1 relationship type between EMPLOYEE and DEPARTMENT.

EMPLOYEE participation is partial. DEPARTMENT participation is not clear

from the requirements. We question the users, who say that a department

must have a manager at all times, which implies total participation.¹³ The

attribute Start_date is assigned to this relationship type.

■

WORKS_FOR, a

1:N relationship type between DEPARTMENT and

EMPLOYEE. Both participations are total.

■

CONTROLS, a 1:N relationship type between DEPARTMENT and PROJECT.

The participation of PROJECT is total, whereas that of DEPARTMENT is

determined to be partial, after consultation with the users indicates that

some departments may control no projects.

■

SUPERVISION, a 1:N relationship type between EMPLOYEE (in the supervi-

sor role) and EMPLOYEE (in the supervisee role). Both participations are

determined to be partial, after the users indicate that not every employee is a

supervisor and not every employee has a supervisor.

■

WORKS_ON, determined to be an M:N relationship type with attribute

Hours, after the users indicate that a project can have several employees

working on it. Both participations are determined to be total.

■

DEPENDENTS_OF, a 1:N relationship type between EMPLOYEE and

DEPENDENT, which is also the identifying relationship for the weak entity

¹³The rules in the miniworld that determine the constraints are sometimes called the business rules,

since they are determined by the business or organization that will utilize the database.

7.7

ER Diagrams, Naming Conventions, and Design Issues

221

type DEPENDENT. The participation of EMPLOYEE is partial, whereas that of

DEPENDENT is total.

After specifying the above six relationship types, we remove from the entity types in

Figure 7.8 all attributes that have been refined into relationships. These include

Manager and Manager_start_date from DEPARTMENT; Controlling_department from

PROJECT; Department, Supervisor, and Works_on from EMPLOYEE; and Employee

from DEPENDENT. It is important to have the least possible redundancy when we

design the conceptual schema of a database. If some redundancy is desired at the

storage level or at the user view level, it can be introduced later, as discussed in

Section 1.6.1.

7.7 ER Diagrams, Naming Conventions,

and Design Issues

7.7.1

Summary of Notation for ER Diagrams

Figures 7.9 through 7.13 illustrate examples of the participation of entity types in

relationship types by displaying their sets or extensions—the individual entity

instances in an entity set and the individual relationship instances in a relationship

set. In ER diagrams the emphasis is on representing the schemas rather than the

instances. This is more useful in database design because a database schema changes

rarely, whereas the contents of the entity sets change frequently. In addition, the

schema is obviously easier to display, because it is much smaller.

Figure 7.2 displays the COMPANY ER database schema as an ER diagram. We now

review the full ER diagram notation. Entity types such as EMPLOYEE,

DEPARTMENT, and PROJECT are shown in rectangular boxes. Relationship types

such as WORKS_FOR, MANAGES, CONTROLS, and WORKS_ON are shown in

diamond-shaped boxes attached to the participating entity types with straight lines.

Attributes are shown in ovals, and each attribute is attached by a straight line to its

entity type or relationship type. Component attributes of a composite attribute are

attached to the oval representing the composite attribute, as illustrated by the Name

attribute of EMPLOYEE. Multivalued attributes are shown in double ovals, as illus-

trated by the Locations attribute of DEPARTMENT. Key attributes have their names

underlined. Derived attributes are shown in dotted ovals, as illustrated by the

Number_of_employees attribute of DEPARTMENT.

Weak entity types are distinguished by being placed in double rectangles and by

having their identifying relationship placed in double diamonds, as illustrated by

the DEPENDENT entity type and the DEPENDENTS_OF identifying relationship

type. The partial key of the weak entity type is underlined with a dotted line.

In Figure 7.2 the cardinality ratio of each binary relationship type is specified by

attaching a 1, M, or N on each participating edge. The cardinality ratio of

DEPARTMENT:EMPLOYEE in MANAGES is 1:1, whereas it is 1:N for DEPARTMENT:

EMPLOYEE in WORKS_FOR, and M:N for WORKS_ON. The participation

222

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

constraint is specified by a single line for partial participation and by double lines

for total participation (existence dependency).

In Figure 7.2 we show the role names for the SUPERVISION relationship type

because the same EMPLOYEE entity type plays two distinct roles in that relationship.

Notice that the cardinality ratio is 1:N from supervisor to supervisee because each

employee in the role of supervisee has at most one direct supervisor, whereas an

employee in the role of supervisor can supervise zero or more employees.

Figure 7.14 summarizes the conventions for ER diagrams. It is important to note

that there are many other alternative diagrammatic notations (see Section 7.7.4 and

Appendix A).

7.7.2

Proper Naming of Schema Constructs

When designing a database schema, the choice of names for entity types, attributes,

relationship types, and (particularly) roles is not always straightforward. One

should choose names that convey, as much as possible, the meanings attached to the

different constructs in the schema. We choose to use singular names for entity types,

rather than plural ones, because the entity type name applies to each individual

entity belonging to that entity type. In our ER diagrams, we will use the convention

that entity type and relationship type names are uppercase letters, attribute names

have their initial letter capitalized, and role names are lowercase letters. We have

used this convention in Figure 7.2.

As a general practice, given a narrative description of the database requirements, the

nouns appearing in the narrative tend to give rise to entity type names, and the verbs

tend to indicate names of relationship types. Attribute names generally arise from

additional nouns that describe the nouns corresponding to entity types.

Another naming consideration involves choosing binary relationship names to

make the ER diagram of the schema readable from left to right and from top to bot-

tom. We have generally followed this guideline in Figure 7.2. To explain this naming

convention further, we have one exception to the convention in Figure 7.2—the

DEPENDENTS_OF relationship type, which reads from bottom to top. When we

describe this relationship, we can say that the DEPENDENT entities (bottom entity

type) are DEPENDENTS_OF (relationship name) an EMPLOYEE (top entity type).

To change this to read from top to bottom, we could rename the relationship type to

HAS_DEPENDENTS, which would then read as follows: An EMPLOYEE entity (top

entity type) HAS_DEPENDENTS (relationship name) of type DEPENDENT (bottom

entity type). Notice that this issue arises because each binary relationship can be

described starting from either of the two participating entity types, as discussed in

the beginning of Section 7.4.

7.7.3

Design Choices for ER Conceptual Design

It is occasionally difficult to decide whether a particular concept in the miniworld

should be modeled as an entity type, an attribute, or a relationship type. In this

7.7

ER Diagrams, Naming Conventions, and Design Issues

223

Symbol

Meaning

Figure 7.14

Summary of the notation

for ER diagrams.

Entity

Weak Entity

Relationship

Indentifying Relationship

Attribute

Key Attribute

Multivalued Attribute

Composite Attribute

Derived Attribute

E₁

E₂

Total Participation of E₂ in R

E₁

E₂

Cardinality Ratio 1: N for E₁:E₂ in R

(min, max)

Structural Constraint (min, max)

on Participation of E in R

224

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

section, we give some brief guidelines as to which construct should be chosen in

particular situations.

In general, the schema design process should be considered an iterative refinement

process, where an initial design is created and then iteratively refined until the most

suitable design is reached. Some of the refinements that are often used include the

following:

■

A concept may be first modeled as an attribute and then refined into a rela-

tionship because it is determined that the attribute is a reference to another

entity type. It is often the case that a pair of such attributes that are inverses

of one another are refined into a binary relationship. We discussed this type

of refinement in detail in Section 7.6. It is important to note that in

our notation, once an attribute is replaced by a relationship, the attribute

itself should be removed from the entity type to avoid duplication and

redundancy.

■

Similarly, an attribute that exists in several entity types may be elevated or

promoted to an independent entity type. For example, suppose that several

entity types in a UNIVERSITY database, such as STUDENT, INSTRUCTOR, and

COURSE, each has an attribute Department in the initial design; the designer

may then choose to create an entity type DEPARTMENT with a single attrib-

ute Dept_name and relate it to the three entity types

(STUDENT,

INSTRUCTOR, and COURSE) via appropriate relationships. Other attrib-

utes/relationships of DEPARTMENT may be discovered later.

■

An inverse refinement to the previous case may be applied—for example, if

an entity type DEPARTMENT exists in the initial design with a single attribute

Dept_name and is related to only one other entity type, STUDENT. In this

case, DEPARTMENT may be reduced or demoted to an attribute of STUDENT.

■

Section 7.9 discusses choices concerning the degree of a relationship. In

Chapter 8, we discuss other refinements concerning specialization/general-

ization. Chapter 10 discusses additional top-down and bottom-up refine-

ments that are common in large-scale conceptual schema design.

7.7.4

Alternative Notations for ER Diagrams

There are many alternative diagrammatic notations for displaying ER diagrams.

Appendix A gives some of the more popular notations. In Section 7.8, we introduce

the Unified Modeling Language (UML) notation for class diagrams, which has been

proposed as a standard for conceptual object modeling.

In this section, we describe one alternative ER notation for specifying structural

constraints on relationships, which replaces the cardinality ratio (1:1, 1:N, M:N)

and single/double line notation for participation constraints. This notation involves

associating a pair of integer numbers (min, max) with each participation of an

entity type E in a relationship type R, where 0 ≤ min ≤ max and max ≥ 1. The num-

bers mean that for each entity e in E, e must participate in at least min and at most

7.7

ER Diagrams, Naming Conventions, and Design Issues

225

max relationship instances in R at any point in time. In this method, min = 0 implies partial participation,

whereas min > 0 implies total participation.

Figure 7.15 displays the COMPANY database schema using the (min, max) notation.¹⁴ Usually, one uses

either the cardinality ratio/single-line/double-line notation or the (min, max) notation. The (min, max)

Figure 7.15

Fname

Minit

Lname

ER diagrams for the company schema, with structural con-

straints specified using (min, max) notation and role names.

Bdate

Name

Address

Salary

Ssn

Sex

Locations

WORKS_FOR

(4,N)

(1,1)

Name

Number

Employee

Department

Start_date

Number_of_employees

EMPLOYEE

DEPARTMENT

(0,1)

Department

(0,N)

Controlling

Manager

Managed

(1,1)

Department

MANAGES

CONTROLS

Hours

(1,N)

Worker

Controlled

(0,N)

(0,1)

(1,1)

Project

Supervisor

Supervisee

Project

WORKS_ON

PROJECT

(1,N)

SUPERVISION

(0,N)

Employee

Name

Location

Number

DEPENDENTS_OF

(1,1)

Dependent

DEPENDENT

Name

Sex

Birth_date

Relationship

¹⁴In some notations, particularly those used in object modeling methodologies such as UML, the (min,

max) is placed on the opposite sides to the ones we have shown. For example, for the WORKS_FOR

relationship in Figure 7.15, the (1,1) would be on the DEPARTMENT side, and the (4,N) would be on the

EMPLOYEE side. Here we used the original notation from Abrial (1974).

226

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

notation is more precise, and we can use it to specify some structural constraints for

relationship types of higher degree. However, it is not sufficient for specifying some

key constraints on higher-degree relationships, as discussed in Section 7.9.

Figure 7.15 also displays all the role names for the COMPANY database schema.

7.8 Example of Other Notation:

UML Class Diagrams

The UML methodology is being used extensively in software design and has many

types of diagrams for various software design purposes. We only briefly present the

basics of UML class diagrams here, and compare them with ER diagrams. In some

ways, class diagrams can be considered as an alternative notation to ER diagrams.

Additional UML notation and concepts are presented in Section 8.6, and in Chapter

10. Figure 7.16 shows how the COMPANY ER database schema in Figure 7.15 can be

displayed using UML class diagram notation. The entity types in Figure 7.15 are

modeled as classes in Figure 7.16. An entity in ER corresponds to an object in UML.

Figure 7.16

The COMPANY conceptual schema

in UML class diagram notation.

EMPLOYEE

DEPARTMENT

Multiplicity

4..*

WORKS_FOR

1..1

Notation in OMT:

Name: Name_dom

Name

Fname

Number

1..1

0..1

Minit

0..*

add_employee

Lname

0..1

number_of_employees

Ssn

MANAGES

change_manager

0..*

Bdate: Date

Start_date

Sex: {M,F}

1..1

Address

1..*

Salary

1..*

age

change_department

CONTROLS

LOCATION

change_projects

supervisee

WORKS_ON

Name

Hours

1..1

Dependent_name

0..1

1..*

supervisor

PROJECT

0..*

Name

DEPENDENT

Number

Sex: {M,F}

add_employee

Aggregation

Birth_date: Date

add_project

Notation in UML:

Relationship

change_manager

Whole

Part

7.8

Example of Other Notation: UML Class Diagrams

227

In UML class diagrams, a class (similar to an entity type in ER) is displayed as a box

(see Figure 7.16) that includes three sections: The top section gives the class name

(similar to entity type name); the middle section includes the attributes; and the

last section includes operations that can be applied to individual objects (similar to

individual entities in an entity set) of the class. Operations are not specified in ER

diagrams. Consider the EMPLOYEE class in Figure 7.16. Its attributes are Name, Ssn,

Bdate, Sex, Address, and Salary. The designer can optionally specify the domain of

an attribute if desired, by placing a colon (:) followed by the domaiame or

description, as illustrated by the Name, Sex, and Bdate attributes of EMPLOYEE in

Figure 7.16. A composite attribute is modeled as a structured domain, as illustrated

by the Name attribute of EMPLOYEE. A multivalued attribute will generally be mod-

eled as a separate class, as illustrated by the LOCATION class in Figure 7.16.

Relationship types are called associations in UML terminology, and relationship

instances are called links. A binary association (binary relationship type) is repre-

sented as a line connecting the participating classes (entity types), and may option-

ally have a name. A relationship attribute, called a link attribute, is placed in a box

that is connected to the association’s line by a dashed line. The (min, max) notation

described in Section 7.7.4 is used to specify relationship constraints, which are

called multiplicities in UML terminology. Multiplicities are specified in the form

min..max, and an asterisk (*) indicates no maximum limit on participation.

However, the multiplicities are placed on the opposite ends of the relationship when

compared with the notation discussed in Section 7.7.4 (compare Figures 7.15 and

7.16). In UML, a single asterisk indicates a multiplicity of 0..*, and a single 1 indi-

cates a multiplicity of 1..1. A recursive relationship (see Section 7.4.2) is called a

reflexive association in UML, and the role names—like the multiplicities—are

placed at the opposite ends of an association when compared with the placing of

role names in Figure 7.15.

In UML, there are two types of relationships: association and aggregation.

Aggregation is meant to represent a relationship between a whole object and its

component parts, and it has a distinct diagrammatic notation. In Figure 7.16, we

modeled the locations of a department and the single location of a project as aggre-

gations. However, aggregation and association do not have different structural

properties, and the choice as to which type of relationship to use is somewhat sub-

jective. In the ER model, both are represented as relationships.

UML also distinguishes between unidirectional and bidirectional associations (or

aggregations). In the unidirectional case, the line connecting the classes is displayed

with an arrow to indicate that only one direction for accessing related objects is

needed. If no arrow is displayed, the bidirectional case is assumed, which is the

default. For example, if we always expect to access the manager of a department

starting from a DEPARTMENT object, we would draw the association line represent-

ing the MANAGES association with an arrow from DEPARTMENT to EMPLOYEE. In

addition, relationship instances may be specified to be ordered. For example, we

could specify that the employee objects related to each department through the

WORKS_FOR association (relationship) should be ordered by their Salary attribute

228

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

value. Association (relationship) names are optional in UML, and relationship

attributes are displayed in a box attached with a dashed line to the line representing

the association/aggregation (see Start_date and Hours in Figure 7.16).

The operations given in each class are derived from the functional requirements of

the application, as we discussed in Section 7.1. It is generally sufficient to specify the

operatioames initially for the logical operations that are expected to be applied

to individual objects of a class, as shown in Figure 7.16. As the design is refined,

more details are added, such as the exact argument types (parameters) for each

operation, plus a functional description of each operation. UML has function

descriptions and sequence diagrams to specify some of the operation details, but

these are beyond the scope of our discussion. Chapter 10 will introduce some of

these diagrams.

Weak entities can be modeled using the construct called qualified association (or

qualified aggregation) in UML; this can represent both the identifying relationship

and the partial key, which is placed in a box attached to the owner class. This is illus-

trated by the DEPENDENT class and its qualified aggregation to EMPLOYEE in

Figure 7.16. The partial key Dependent_name is called the discriminator in UML ter-

minology, since its value distinguishes the objects associated with (related to) the

same EMPLOYEE. Qualified associations are not restricted to modeling weak enti-

ties, and they can be used to model other situations in UML.

This section is not meant to be a complete description of UML class diagrams, but

rather to illustrate one popular type of alternative diagrammatic notation that can

be used for representing ER modeling concepts.

7.9 Relationship Types of Degree

Higher than Two

In Section 7.4.2 we defined the degree of a relationship type as the number of par-

ticipating entity types and called a relationship type of degree two binary and a rela-

tionship type of degree three ternary. In this section, we elaborate on the differences

between binary and higher-degree relationships, when to choose higher-degree ver-

sus binary relationships, and how to specify constraints on higher-degree relation-

ships.

7.9.1

Choosing between Binary and Ternary

(or Higher-Degree) Relationships

The ER diagram notation for a ternary relationship type is shown in Figure 7.17(a),

which displays the schema for the SUPPLY relationship type that was displayed at

the entity set/relationship set or instance level in Figure 7.10. Recall that the rela-

tionship set of SUPPLY is a set of relationship instances (s, j, p), where s is a

SUPPLIER who is currently supplying a PART p to a PROJECT j. In general, a rela-

tionship type R of degree n will have n edges in an ER diagram, one connecting R to

each participating entity type.

7.9

Relationship Types of Degree Higher than Two

229

Sname

Quantity

Proj_name

(a)

SUPPLIER

SUPPLY

PROJECT

Part_no

PART

Sname

Proj_name

(b)

SUPPLIER

SUPPLIES

PROJECT

CAN_SUPPLY

USES

Part_no

PART

(c)

Sname

Quantity

Proj_name

SUPPLIER

SUPPLY

SPJ

PROJECT

Figure 7.17

Ternary relationship types. (a) The SUPPLY

Part_no

relationship. (b) Three binary relationships

not equivalent to SUPPLY. (c) SUPPLY

PART

represented as a weak entity type.

Figure

7.17(b) shows an ER diagram for three binary relationship types

CAN_SUPPLY, USES, and SUPPLIES. In general, a ternary relationship type repre-

sents different information than do three binary relationship types. Consider the

three binary relationship types CAN_SUPPLY, USES, and SUPPLIES. Suppose that

CAN_SUPPLY, between SUPPLIER and PART, includes an instance (s, p) whenever

supplier s can supply part p (to any project); USES, between PROJECT and PART,

includes an instance ( j, p) whenever project j uses part p; and SUPPLIES, between

SUPPLIER and PROJECT, includes an instance (s, j) whenever supplier s supplies

230

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

some part to project j. The existence of three relationship instances (s, p), ( j, p), and

(s, j) in CAN_SUPPLY, USES, and SUPPLIES, respectively, does not necessarily imply

that an instance (s, j, p) exists in the ternary relationship SUPPLY, because the

meaning is different. It is often tricky to decide whether a particular relationship

should be represented as a relationship type of degree n or should be broken down

into several relationship types of smaller degrees. The designer must base this

decision on the semantics or meaning of the particular situation being represented.

The typical solution is to include the ternary relationship plus one or more of the

binary relationships, if they represent different meanings and if all are needed by the

application.

Some database design tools are based on variations of the ER model that permit

only binary relationships. In this case, a ternary relationship such as SUPPLY must

be represented as a weak entity type, with no partial key and with three identifying

relationships. The three participating entity types SUPPLIER, PART, and PROJECT

are together the owner entity types (see Figure 7.17(c)). Hence, an entity in the weak

entity type SUPPLY in Figure 7.17(c) is identified by the combination of its three

owner entities from SUPPLIER, PART, and PROJECT.

It is also possible to represent the ternary relationship as a regular entity type by

introducing an artificial or surrogate key. In this example, a key attribute Supply_id

could be used for the supply entity type, converting it into a regular entity type.

Three binary N:1 relationships relate SUPPLY to the three participating entity types.

Another example is shown in Figure 7.18. The ternary relationship type OFFERS

represents information on instructors offering courses during particular semesters;

hence it includes a relationship instance (i, s, c) whenever INSTRUCTOR i offers

COURSE c during SEMESTER s. The three binary relationship types shown in

Figure 7.18 have the following meanings: CAN_TEACH relates a course to the

instructors who can teach that course, TAUGHT_DURING relates a semester to the

instructors who taught some course during that semester, and OFFERED_DURING

Figure 7.18

Semester

Year

Another example of ternary versus

binary relationship types.

TAUGHT_DURING

Lname

Sem_year

INSTRUCTOR

OFFERS

SEMESTER

CAN_TEACH

OFFERED_DURING

Cnumber

COURSE

7.9

Relationship Types of Degree Higher than Two

231

relates a semester to the courses offered during that semester by any instructor.

These ternary and binary relationships represent different information, but certain

constraints should hold among the relationships. For example, a relationship

instance (i, s, c) should not exist in OFFERS unless an instance (i, s) exists in

TAUGHT_DURING, an instance (s, c) exists in OFFERED_DURING, and an instance (i,

c) exists in CAN_TEACH. However, the reverse is not always true; we may have

instances (i, s), (s, c), and (i, c) in the three binary relationship types with no corre-

sponding instance (i, s, c) in OFFERS. Note that in this example, based on the mean-

ings of the relationships, we can infer the instances of TAUGHT_DURING and

OFFERED_DURING from the instances in OFFERS, but we cannot infer the

instances of CAN_TEACH; therefore, TAUGHT_DURING and OFFERED_DURING are

redundant and can be left out.

Although in general three binary relationships cannot replace a ternary relationship,

they may do so under certain additional constraints. In our example, if the

CAN_TEACH relationship is 1:1 (an instructor can teach one course, and a course

can be taught by only one instructor), then the ternary relationship OFFERS can be

left out because it can be inferred from the three binary relationships CAN_TEACH,

TAUGHT_DURING, and OFFERED_DURING. The schema designer must analyze the

meaning of each specific situation to decide which of the binary and ternary rela-

tionship types are needed.

Notice that it is possible to have a weak entity type with a ternary (or n-ary) identi-

fying relationship type. In this case, the weak entity type can have several owner

entity types. An example is shown in Figure 7.19. This example shows part of a data-

base that keeps track of candidates interviewing for jobs at various companies, and

may be part of an employment agency database, for example. In the requirements, a

candidate can have multiple interviews with the same company (for example, with

different company departments or on separate dates), but a job offer is made based

on one of the interviews. Here, INTERVIEW is represented as a weak entity with two

owners CANDIDATE and COMPANY, and with the partial key Dept_date. An

INTERVIEW entity is uniquely identified by a candidate, a company, and the combi-

nation of the date and department of the interview.

Figure 7.19

Name

Cname

A weak entity type INTERVIEW

with a ternary identifying rela-

CANDIDATE

CCI

COMPANY

tionship type.

Department

Date

Dept_date

INTERVIEW

RESULTS_IN

JOB_OFFER

232

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

7.9.2

Constraints on Ternary (or Higher-Degree)

Relationships

There are two notations for specifying structural constraints on n-ary relationships,

and they specify different constraints. They should thus both be used if it is impor-

tant to fully specify the structural constraints on a ternary or higher-degree rela-

tionship. The first notation is based on the cardinality ratio notation of binary

relationships displayed in Figure 7.2. Here, a 1, M, or N is specified on each partici-

pation arc (both M and N symbols stand for many or any number).¹⁵ Let us illus-

trate this constraint using the SUPPLY relationship in Figure 7.17.

Recall that the relationship set of SUPPLY is a set of relationship instances (s, j, p),

where s is a SUPPLIER, j is a PROJECT, and p is a PART. Suppose that the constraint

exists that for a particular project-part combination, only one supplier will be used

(only one supplier supplies a particular part to a particular project). In this case, we

place 1 on the SUPPLIER participation, and M, N on the PROJECT, PART participa-

tions in Figure 7.17. This specifies the constraint that a particular ( j, p) combination

can appear at most once in the relationship set because each such (PROJECT, PART)

combination uniquely determines a single supplier. Hence, any relationship

instance (s, j, p) is uniquely identified in the relationship set by its ( j, p) combina-

tion, which makes ( j, p) a key for the relationship set. In this notation, the participa-

tions that have a 1 specified on them are not required to be part of the identifying

key for the relationship set.¹⁶ If all three cardinalities are M or N, then the key will

be the combination of all three participants.

The second notation is based on the (min, max) notation displayed in Figure 7.15

for binary relationships. A (min, max) on a participation here specifies that each

entity is related to at least min and at most max relationship instances in the relation-

ship set. These constraints have no bearing on determining the key of an n-ary rela-

tionship, where n > 2,¹⁷ but specify a different type of constraint that places

restrictions on how many relationship instances each entity can participate in.

7.10 Summary

In this chapter we presented the modeling concepts of a high-level conceptual data

model, the Entity-Relationship (ER) model. We started by discussing the role that a

high-level data model plays in the database design process, and then we presented a

sample set of database requirements for the COMPANY database, which is one of the

examples that is used throughout this book. We defined the basic ER model con-

cepts of entities and their attributes. Then we discussed NULL values and presented

¹⁵This notation allows us to determine the key of the relationship relation, as we discuss in Chapter 9.

¹⁶This is also true for cardinality ratios of binary relationships.

¹⁷The (min, max) constraints can determine the keys for binary relationships, though.

7.10 Summary

233

the various types of attributes, which can be nested arbitrarily to produce complex

attributes:

■ Simple or atomic

■ Composite

■ Multivalued

We also briefly discussed stored versus derived attributes. Then we discussed the ER

model concepts at the schema or “intension” level:

■ Entity types and their corresponding entity sets

■ Key attributes of entity types

■ Value sets (domains) of attributes

■ Relationship types and their corresponding relationship sets

■ Participation roles of entity types in relationship types

We presented two methods for specifying the structural constraints on relationship

types. The first method distinguished two types of structural constraints:

■ Cardinality ratios (1:1, 1:N, M:N for binary relationships)

■ Participation constraints (total, partial)

We noted that, alternatively, another method of specifying structural constraints is

to specify minimum and maximum numbers (min, max) on the participation of

each entity type in a relationship type. We discussed weak entity types and the

related concepts of owner entity types, identifying relationship types, and partial

key attributes.

Entity-Relationship schemas can be represented diagrammatically as ER diagrams.

We showed how to design an ER schema for the COMPANY database by first defin-

ing the entity types and their attributes and then refining the design to include rela-

tionship types. We displayed the ER diagram for the COMPANY database schema.

We discussed some of the basic concepts of UML class diagrams and how they relate

to ER modeling concepts. We also described ternary and higher-degree relationship

types in more detail, and discussed the circumstances under which they are distin-

guished from binary relationships.

The ER modeling concepts we have presented thus far—entity types, relationship

types, attributes, keys, and structural constraints—can model many database appli-

cations. However, more complex applications—such as engineering design, medical

information systems, and telecommunications—require additional concepts if we

want to model them with greater accuracy. We discuss some advanced modeling

concepts in Chapter 8 and revisit further advanced data modeling techniques in

Chapter 26.

234

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

Review Questions

7.1.

Discuss the role of a high-level data model in the database design process.

7.2.

List the various cases where use of a NULL value would be appropriate.

7.3.

Define the following terms: entity, attribute, attribute value, relationship

instance, composite attribute, multivalued attribute, derived attribute, complex

attribute, key attribute, and value set (domain).

7.4.

What is an entity type? What is an entity set? Explain the differences among

an entity, an entity type, and an entity set.

7.5.

Explain the difference between an attribute and a value set.

7.6.

What is a relationship type? Explain the differences among a relationship

instance, a relationship type, and a relationship set.

7.7.

What is a participation role? When is it necessary to use role names in the

description of relationship types?

7.8.

Describe the two alternatives for specifying structural constraints on rela-

tionship types. What are the advantages and disadvantages of each?

7.9.

Under what conditions can an attribute of a binary relationship type be

migrated to become an attribute of one of the participating entity types?

7.10.

When we think of relationships as attributes, what are the value sets of these

attributes? What class of data models is based on this concept?

7.11.

What is meant by a recursive relationship type? Give some examples of

recursive relationship types.

7.12.

When is the concept of a weak entity used in data modeling? Define the

terms owner entity type, weak entity type, identifying relationship type, and

partial key.

7.13.

Can an identifying relationship of a weak entity type be of a degree greater

than two? Give examples to illustrate your answer.

7.14.

Discuss the conventions for displaying an ER schema as an ER diagram.

7.15.

Discuss the naming conventions used for ER schema diagrams.

Exercises

7.16. Consider the following set of requirements for a UNIVERSITY database that is

used to keep track of students’ transcripts. This is similar but not identical to

the database shown in Figure 1.2:

a. The university keeps track of each student’s name, student number, Social

Security number, current address and phone number, permanent address

and phone number, birth date, sex, class (freshman, sophomore, …, grad-

uate), major department, minor department (if any), and degree program

Exercises

235

(B.A., B.S., …, Ph.D.). Some user applications need to refer to the city,

state, and ZIP Code of the student’s permanent address and to the stu-

dent’s last name. Both Social Security number and student number have

unique values for each student.

b. Each department is described by a name, department code, office num-

ber, office phone number, and college. Both name and code have unique

values for each department.

c. Each course has a course name, description, course number, number of

semester hours, level, and offering department. The value of the course

number is unique for each course.

d. Each section has an instructor, semester, year, course, and sectioum-

ber. The sectioumber distinguishes sections of the same course that are

taught during the same semester/year; its values are 1, 2, 3, …, up to the

number of sections taught during each semester.

e. A grade report has a student, section, letter grade, and numeric grade (0,

1, 2, 3, or 4).

Design an ER schema for this application, and draw an ER diagram for the

schema. Specify key attributes of each entity type, and structural constraints

on each relationship type. Note any unspecified requirements, and make

appropriate assumptions to make the specification complete.

7.17.

Composite and multivalued attributes can be nested to any number of levels.

Suppose we want to design an attribute for a STUDENT entity type to keep

track of previous college education. Such an attribute will have one entry for

each college previously attended, and each such entry will be composed of

college name, start and end dates, degree entries (degrees awarded at that

college, if any), and transcript entries (courses completed at that college, if

any). Each degree entry contains the degree name and the month and year

the degree was awarded, and each transcript entry contains a course name,

semester, year, and grade. Design an attribute to hold this information. Use

the conventions in Figure 7.5.

7.18.

Show an alternative design for the attribute described in Exercise 7.17 that

uses only entity types (including weak entity types, if needed) and relation-

ship types.

7.19.

Consider the ER diagram in Figure 7.20, which shows a simplified schema

for an airline reservations system. Extract from the ER diagram the require-

ments and constraints that produced this schema. Try to be as precise as pos-

sible in your requirements and constraints specification.

7.20.

In Chapters 1 and 2, we discussed the database environment and database

users. We can consider many entity types to describe such an environment,

such as DBMS, stored database, DBA, and catalog/data dictionary. Try to

specify all the entity types that can fully describe a database system and its

environment; then specify the relationship types among them, and draw an

ER diagram to describe such a general database environment.

236

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

Figure 7.20

An ER diagram for an AIRLINE database schema.

Airport_code

City

State

Name

DEPARTURE_

AIRPORT

Le g_no

AIRPORT

Scheduled_dep_time

Scheduled_arr_time

FLIGHT_LEG

ARRIVAL_

CAN_

AIRPORT

Instances

LEGS

LAND

Number

Type_name

Max_seats

INSTANCE_OF

Airline

FLIGHT

Company

AIRPLANE_

Arr_time

Weekdays

TYPE

DEPARTS

Restrictions

FARES

ARRIVES

Dep_time

Amount

TYPE

Code

FARE

Airplane_id

Total_no_of_seats

No_of_avail_seats

Date

AIRPLANE

ASSIGNED

LEG_INSTANCE

Customer_name

Cphone

Seat_no

RESERVATION

Notes:

A LEG (segment) is a nonstop portion of a flight

SEAT

A LEG_INSTANCE is a particular occurrence

of a LEG on a particular date.

7.21. Design an ER schema for keeping track of information about votes taken in

the U.S. House of Representatives during the current two-year congressional

session. The database needs to keep track of each U.S. STATE’s Name (e.g.,

‘Texas’, ‘New York’, ‘California’) and include the Region of the state (whose

domain is {‘Northeast’, ‘Midwest’, ‘Southeast’, ‘Southwest’, ‘West’}). Each

Exercises

237

CONGRESS_PERSON in the House of Representatives is described by his or

her Name, plus the District represented, the Start_date when the congressper-

son was first elected, and the political Party to which he or she belongs

(whose domain is {‘Republican’, ‘Democrat’, ‘Independent’, ‘Other’}). The

database keeps track of each BILL (i.e., proposed law), including the

Bill_name, the Date_of_vote on the bill, whether the bill Passed_or_failed

(whose domain is {‘Yes’, ‘No’}), and the Sponsor (the congressperson(s) who

sponsored—that is, proposed—the bill). The database also keeps track of

how each congressperson voted on each bill (domain of Vote attribute is

{‘Yes’, ‘No’, ‘Abstain’, ‘Absent’}). Draw an ER schema diagram for this applica-

tion. State clearly any assumptions you make.

7.22.

A database is being constructed to keep track of the teams and games of a

sports league. A team has a number of players, not all of whom participate in

each game. It is desired to keep track of the players participating in each

game for each team, the positions they played in that game, and the result of

the game. Design an ER schema diagram for this application, stating any

assumptions you make. Choose your favorite sport (e.g., soccer, baseball,

football).

7.23.

Consider the ER diagram shown in Figure 7.21 for part of a BANK database.

Each bank can have multiple branches, and each branch can have multiple

accounts and loans.

a. List the strong (nonweak) entity types in the ER diagram.

b. Is there a weak entity type? If so, give its name, partial key, and identifying

relationship.

c. What constraints do the partial key and the identifying relationship of the

weak entity type specify in this diagram?

d. List the names of all relationship types, and specify the (min, max) con-

straint on each participation of an entity type in a relationship type.

Justify your choices.

e. List concisely the user requirements that led to this ER schema design.

f. Suppose that every customer must have at least one account but is

restricted to at most two loans at a time, and that a bank branch cannot

have more than 1,000 loans. How does this show up on the (min, max)

constraints?

7.24.

Consider the ER diagram in Figure 7.22. Assume that an employee may work

in up to two departments or may not be assigned to any department. Assume

that each department must have one and may have up to three phone num-

bers. Supply (min, max) constraints on this diagram. State clearly any addi-

tional assumptions you make. Under what conditions would the relationship

HAS_PHONE be redundant in this example?

7.25.

Consider the ER diagram in Figure 7.23. Assume that a course may or may

not use a textbook, but that a text by definition is a book that is used in some

course. A course may not use more than five books. Instructors teach from

238

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

BANK_BRANCH

BANK

BRANCHES

Code

Name

Addr

Branch_no

ACCTS

LOANS

Acct_no

Balance

Loan_no

Amount

ACCOUNT

Type

LOAN

Type

A_C

L_C

Ssn

Name

Figure 7.21

An ER diagram for a BANK

Phone

CUSTOMER

Addr

database schema.

EMPLOYEE

WORKS_IN

DEPARTMENT

HAS_PHONE

CONTAINS

Figure 7.22

Part of an ER diagram

for a COMPANY data-

PHONE

base.

Exercises

239

INSTRUCTOR

TEACHES

COURSE

USES

Figure 7.23

Part of an ER diagram

for a COURSES data-

TEXT

base.

two to four courses. Supply (min, max) constraints on this diagram. State

clearly any additional assumptions you make. If we add the relationship

ADOPTS, to indicate the textbook(s) that an instructor uses for a course,

should it be a binary relationship between INSTRUCTOR and TEXT, or a ter-

nary relationship between all three entity types? What (min, max) con-

straints would you put on it? Why?

7.26.

Consider an entity type SECTION in a UNIVERSITY database, which describes

the section offerings of courses. The attributes of SECTION are

Section_number, Semester, Year, Course_number, Instructor, Room_no (where

section is taught), Building (where section is taught), Weekdays (domain is

the possible combinations of weekdays in which a section can be offered

{‘MWF’, ‘MW’, ‘TT’, and so on}), and Hours (domain is all possible time peri-

ods during which sections are offered {‘9-9:50 A.M.’, ‘10-10:50 A.M.’, …,

‘3:30-4:50 P.M.’, ‘5:30-6:20 P.M.’, and so on}). Assume that Section_number is

unique for each course within a particular semester/year combination (that

is, if a course is offered multiple times during a particular semester, its sec-

tion offerings are numbered 1, 2, 3, and so on). There are several composite

keys for section, and some attributes are components of more than one key.

Identify three composite keys, and show how they can be represented in an

ER schema diagram.

7.27.

Cardinality ratios often dictate the detailed design of a database. The cardi-

nality ratio depends on the real-world meaning of the entity types involved

and is defined by the specific application. For the following binary relation-

ships, suggest cardinality ratios based on the common-sense meaning of the

entity types. Clearly state any assumptions you make.

Entity 1

Cardinality Ratio

Entity 2

1. STUDENT

______________

SOCIAL_SECURITY_CARD

2. STUDENT

______________

TEACHER

3. CLASSROOM

______________

WALL

240

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

4. COUNTRY

______________

CURRENT_PRESIDENT

5. COURSE

______________

TEXTBOOK

6. ITEM (that can

be found in an

order)

______________

ORDER

7. STUDENT

______________

CLASS

8. CLASS

______________

INSTRUCTOR

9. INSTRUCTOR

______________

OFFICE

10. EBAY_AUCTION

_ITEM

______________

EBAY_BID

7.28. Consider the ER schema for the MOVIES database in Figure 7.24.

Assume that MOVIES is a populated database. ACTOR is used as a generic

term and includes actresses. Given the constraints shown in the ER schema,

respond to the following statements with True, False, or Maybe. Assign a

response of Maybe to statements that, while not explicitly shown to be True,

cannot be proven False based on the schema as shown. Justify each answer.

Figure 7.24

An ER diagram for a MOVIES

database schema.

PERFORMS_IN

MOVIE

ACTOR

LEAD_ROLE

ALSO_A_

DIRECTOR

ACTOR_

PRODUCER

DIRECTOR

DIRECTS

PRODUCER

PRODUCES

Laboratory Exercises

241

a. There are no actors in this database that have been io movies.

b. There are some actors who have acted in more than ten movies.

c. Some actors have done a lead role in multiple movies.

d. A movie can have only a maximum of two lead actors.

e. Every director has been an actor in some movie.

f. No producer has ever been an actor.

g. A producer cannot be an actor in some other movie.

h. There are movies with more than a dozen actors.

i. Some producers have been a director as well.

j. Most movies have one director and one producer.

k. Some movies have one director but several producers.

l. There are some actors who have done a lead role, directed a movie, and

produced some movie.

m. No movie has a director who also acted in that movie.

7.29.

Given the ER schema for the MOVIES database in Figure 7.24, draw an

instance diagram using three movies that have been released recently. Draw

instances of each entity type: MOVIES, ACTORS, PRODUCERS, DIRECTORS

involved; make up instances of the relationships as they exist in reality for

those movies.

7.30.

Illustrate the UML Diagram for Exercise 7.16. Your UML design should

observe the following requirements:

a. A student should have the ability to compute his/her GPA and add or

drop majors and minors.

b. Each department should be to able add or delete courses and hire or ter-

minate faculty.

c. Each instructor should be able to assign or change a student’s grade for a

course.

Note: Some of these functions may be spread over multiple classes.

Laboratory Exercises

7.31. Consider the UNIVERSITY database described in Exercise 7.16. Build the ER

schema for this database using a data modeling tool such as ERwin or

Rational Rose.

7.32. Consider a MAIL_ORDER database in which employees take orders for parts

from customers. The data requirements are summarized as follows:

■ The mail order company has employees, each identified by a unique

employee number, first and last name, and Zip Code.

■ Each customer of the company is identified by a unique customer num-

ber, first and last name, and Zip Code.

242

Chapter 7 Data Modeling Using the Entity-Relationship (ER) Model

■ Each part sold by the company is identified by a unique part number, a

part name, price, and quantity in stock.

■ Each order placed by a customer is taken by an employee and is given a

unique order number. Each order contains specified quantities of one or

more parts. Each order has a date of receipt as well as an expected ship

date. The actual ship date is also recorded.

Design an Entity-Relationship diagram for the mail order database and

build the design using a data modeling tool such as ERwin or Rational Rose.

7.33.

Consider a MOVIE database in which data is recorded about the movie indus-

try. The data requirements are summarized as follows:

■ Each movie is identified by title and year of release. Each movie has a

length in minutes. Each has a production company, and each is classified

under one or more genres (such as horror, action, drama, and so forth).

Each movie has one or more directors and one or more actors appear in

it. Each movie also has a plot outline. Finally, each movie has zero or more

quotable quotes, each of which is spoken by a particular actor appearing

in the movie.

■ Actors are identified by name and date of birth and appear in one or more

movies. Each actor has a role in the movie.

■ Directors are also identified by name and date of birth and direct one or

more movies. It is possible for a director to act in a movie (including one

that he or she may also direct).

■ Production companies are identified by name and each has an address. A

production company produces one or more movies.

Design an Entity-Relationship diagram for the movie database and enter the

design using a data modeling tool such as ERwin or Rational Rose.

7.34.

Consider a CONFERENCE_REVIEW database in which researchers submit

their research papers for consideration. Reviews by reviewers are recorded

for use in the paper selection process. The database system caters primarily

to reviewers who record answers to evaluation questions for each paper they

review and make recommendations regarding whether to accept or reject the

paper. The data requirements are summarized as follows:

■ Authors of papers are uniquely identified by e-mail id. First and last

names are also recorded.

■ Each paper is assigned a unique identifier by the system and is described

by a title, abstract, and the name of the electronic file containing the

paper.

■ A paper may have multiple authors, but one of the authors is designated

as the contact author.

■ Reviewers of papers are uniquely identified by e-mail address. Each

reviewer’s first name, last name, phone number, affiliation, and topics of

interest are also recorded.

Selected Bibliography

243

■ Each paper is assigned between two and four reviewers. A reviewer rates

each paper assigned to him or her on a scale of 1 to 10 in four categories:

technical merit, readability, originality, and relevance to the conference.

Finally, each reviewer provides an overall recommendation regarding

each paper.

■ Each review contains two types of written comments: one to be seen by

the review committee only and the other as feedback to the author(s).

Design an Entity-Relationship diagram for the CONFERENCE_REVIEW

database and build the design using a data modeling tool such as ERwin or

Rational Rose.

7.35. Consider the ER diagram for the AIRLINE database shown in Figure 7.20.

Build this design using a data modeling tool such as ERwin or Rational Rose.

Selected Bibliography

The Entity-Relationship model was introduced by Chen (1976), and related work

appears in Schmidt and Swenson (1975), Wiederhold and Elmasri (1979), and

Senko (1975). Since then, numerous modifications to the ER model have been sug-

gested. We have incorporated some of these in our presentation. Structural con-

straints on relationships are discussed in Abrial (1974), Elmasri and Wiederhold

(1980), and Lenzerini and Santucci (1983). Multivalued and composite attributes

are incorporated in the ER model in Elmasri et al. (1985). Although we did not dis-

cuss languages for the ER model and its extensions, there have been several propos-

als for such languages. Elmasri and Wiederhold (1981) proposed the GORDAS

query language for the ER model. Another ER query language was proposed by

Markowitz and Raz (1983). Senko (1980) presented a query language for Senko’s

DIAM model. A formal set of operations called the ER algebra was presented by

Parent and Spaccapietra (1985). Gogolla and Hohenstein (1991) presented another

formal language for the ER model. Campbell et al. (1985) presented a set of ER

operations and showed that they are relationally complete. A conference for the dis-

semination of research results related to the ER model has been held regularly since

1979. The conference, now known as the International Conference on Conceptual

Modeling, has been held in Los Angeles (ER 1979, ER 1983, ER 1997), Washington,

D.C. (ER 1981), Chicago (ER 1985), Dijon, France (ER 1986), New York City (ER

1987), Rome (ER 1988), Toronto (ER 1989), Lausanne, Switzerland (ER 1990), San

Mateo, California (ER 1991), Karlsruhe, Germany (ER 1992), Arlington, Texas (ER

1993), Manchester, England (ER 1994), Brisbane, Australia (ER 1995), Cottbus,

Germany (ER 1996), Singapore (ER 1998), Paris, France (ER 1999), Salt Lake City,

Utah (ER 2000), Yokohama, Japan (ER 2001), Tampere, Finland (ER 2002),

Chicago, Illinois (ER 2003), Shanghai, China (ER 2004), Klagenfurt, Austria (ER

2005), Tucson, Arizona (ER 2006), Auckland, New Zealand (ER 2007), Barcelona,

Catalonia, Spain (ER 2008), and Gramado, RS, Brazil (ER 2009). The 2010 confer-

ence is to be held in Vancouver, BC, Canada.

This page intentionally left blank

chapter

The Enhanced Entity-Relationship

(EER) Model

he ER modeling concepts discussed in Chapter 7

are sufficient for representing many database

schemas for traditional database applications, which include many data-processing

applications in business and industry. Since the late 1970s, however, designers of

database applications have tried to design more accurate database schemas that

reflect the data properties and constraints more precisely. This was particularly

important for newer applications of database technology, such as databases for

engineering design and manufacturing (CAD/CAM),¹ telecommunications, com-

plex software systems, and Geographic Information Systems (GIS), among many

other applications. These types of databases have more complex requirements than

do the more traditional applications. This led to the development of additional

semantic data modeling concepts that were incorporated into conceptual data mod-

els such as the ER model. Various semantic data models have been proposed in the

literature. Many of these concepts were also developed independently in related

areas of computer science, such as the knowledge representation area of artificial

intelligence and the object modeling area in software engineering.

In this chapter, we describe features that have been proposed for semantic data mod-

els, and show how the ER model can be enhanced to include these concepts, leading

to the Enhanced ER (EER) model.² We start in Section 8.1 by incorporating the con-

cepts of class/subclass relationships and type inheritance into the ER model. Then, in

Section 8.2, we add the concepts of specialization and generalization. Section 8.3

¹CAD/CAM stands for computer-aided design/computer-aided manufacturing.

²EER has also been used to stand for Extended ER model.

245

246

Chapter 8 The Enhanced Entity-Relationship (EER) Model

discusses the various types of constraints on specialization/generalization, and

Section 8.4 shows how the UNION construct can be modeled by including the con-

cept of category in the EER model. Section 8.5 gives a sample UNIVERSITY database

schema in the EER model and summarizes the EER model concepts by giving formal

definitions. We will use the terms object and entity interchangeably in this chapter,

because many of these concepts are commonly used in object-oriented models.

We present the UML class diagram notation for representing specialization and gen-

eralization in Section 8.6, and briefly compare these with EER notation and con-

cepts. This serves as an example of alternative notation, and is a continuation of

Section 7.8, which presented basic UML class diagram notation that corresponds to

the basic ER model. In Section 8.7, we discuss the fundamental abstractions that are

used as the basis of many semantic data models. Section 8.8 summarizes the chapter.

For a detailed introduction to conceptual modeling, Chapter 8 should be consid-

ered a continuation of Chapter 7. However, if only a basic introduction to ER mod-

eling is desired, this chapter may be omitted. Alternatively, the reader may choose to

skip some or all of the later sections of this chapter (Sections 8.4 through 8.8).

8.1 Subclasses, Superclasses, and Inheritance

The EER model includes all the modeling concepts of the ER model that were pre-

sented in Chapter 7. In addition, it includes the concepts of subclass and

superclass and the related concepts of specialization and generalization (see

Sections 8.2 and 8.3). Another concept included in the EER model is that of a

category or union type (see Section 8.4), which is used to represent a collection of

objects (entities) that is the union of objects of different entity types. Associated

with these concepts is the important mechanism of attribute and relationship

inheritance. Unfortunately, no standard terminology exists for these concepts, so

we use the most common terminology. Alternative terminology is given in foot-

notes. We also describe a diagrammatic technique for displaying these concepts

when they arise in an EER schema. We call the resulting schema diagrams

enhanced ER or EER diagrams.

The first Enhanced ER (EER) model concept we take up is that of a subtype or

subclass of an entity type. As we discussed in Chapter 7, an entity type is used to

represent both a type of entity and the entity set or collection of entities of that type

that exist in the database. For example, the entity type EMPLOYEE describes the type

(that is, the attributes and relationships) of each employee entity, and also refers to

the current set of EMPLOYEE entities in the COMPANY database. In many cases an

entity type has numerous subgroupings or subtypes of its entities that are meaning-

ful and need to be represented explicitly because of their significance to the database

application. For example, the entities that are members of the EMPLOYEE entity

type may be distinguished further into SECRETARY, ENGINEER, MANAGER,

TECHNICIAN, SALARIED_EMPLOYEE, HOURLY_EMPLOYEE, and so on. The set of

entities in each of the latter groupings is a subset of the entities that belong to the

EMPLOYEE entity set, meaning that every entity that is a member of one of these

8.1

Subclasses, Superclasses, and Inheritance

247

subgroupings is also an employee. We call each of these subgroupings a subclass or

subtype of the EMPLOYEE entity type, and the EMPLOYEE entity type is called the

superclass or supertype for each of these subclasses. Figure 8.1 shows how to repre-

sent these concepts diagramatically in EER diagrams. (The circle notation in Figure

8.1 will be explained in Section 8.2.)

We call the relationship between a superclass and any one of its subclasses a

superclass/subclass or supertype/subtype or simply class/subclass relationship.³

In our previous example, EMPLOYEE/SECRETARY and EMPLOYEE/TECHNICIAN are

two class/subclass relationships. Notice that a member entity of the subclass repre-

sents the same real-world entity as some member of the superclass; for example, a

SECRETARY entity ‘Joan Logano’ is also the EMPLOYEE ‘Joan Logano.’ Hence, the

subclass member is the same as the entity in the superclass, but in a distinct specific

role. When we implement a superclass/subclass relationship in the database system,

however, we may represent a member of the subclass as a distinct database object—

say, a distinct record that is related via the key attribute to its superclass entity. In

Section 9.2, we discuss various options for representing superclass/subclass rela-

tionships in relational databases.

Figure 8.1

Fname

Minit

Lname

EER diagram

notation to represent

Name

Ssn

Birth_date

Add ress

subclasses and

specialization.

EMPLOYEE

Typing_speed

Tgrade

Eng_type

Pay_scale

SECRETARY

TECHNICIAN

ENGINEER

MANAGER

Salary

HOURLY_EMPLOYEE

SALARIED_EMPLOYEE

Three specializations of EMPLOYEE:

{SECRETARY, TECHNICIAN, ENGINEER}

MANAGES

BELONGS_TO

{MANAGER}

{HOURLY_EMPLOYEE, SALARIED_EMPLOYEE}

PROJECT

TRADE_UNION

³A class/subclass relationship is often called an IS-A (or IS-AN) relationship because of the way we

refer to the concept. We say a SECRETARY is an EMPLOYEE, a TECHNICIAN is an EMPLOYEE, and

so on.

248

Chapter 8 The Enhanced Entity-Relationship (EER) Model

An entity cannot exist in the database merely by being a member of a subclass; it

must also be a member of the superclass. Such an entity can be included optionally

as a member of any number of subclasses. For example, a salaried employee who is

also an engineer belongs to the two subclasses ENGINEER and

SALARIED_EMPLOYEE of the EMPLOYEE entity type. However, it is not necessary

that every entity in a superclass is a member of some subclass.

An important concept associated with subclasses (subtypes) is that of type inheri-

tance. Recall that the type of an entity is defined by the attributes it possesses and

the relationship types in which it participates. Because an entity in the subclass rep-

resents the same real-world entity from the superclass, it should possess values for

its specific attributes as well as values of its attributes as a member of the superclass.

We say that an entity that is a member of a subclass inherits all the attributes of the

entity as a member of the superclass. The entity also inherits all the relationships in

which the superclass participates. Notice that a subclass, with its own specific (or

local) attributes and relationships together with all the attributes and relationships

it inherits from the superclass, can be considered an entity type in its own right.⁴

8.2 Specialization and Generalization

8.2.1

Specialization

Specialization is the process of defining a set of subclasses of an entity type; this

entity type is called the superclass of the specialization. The set of subclasses that

forms a specialization is defined on the basis of some distinguishing characteristic

of the entities in the superclass. For example, the set of subclasses {SECRETARY,

ENGINEER, TECHNICIAN} is a specialization of the superclass EMPLOYEE that dis-

tinguishes among employee entities based on the job type of each employee entity.

We may have several specializations of the same entity type based on different dis-

tinguishing characteristics. For example, another specialization of the EMPLOYEE

entity type may yield the set of subclasses

{SALARIED_EMPLOYEE,

HOURLY_EMPLOYEE}; this specialization distinguishes among employees based on

the method of pay.

Figure 8.1 shows how we represent a specialization diagrammatically in an EER dia-

gram. The subclasses that define a specialization are attached by lines to a circle that

represents the specialization, which is connected in turn to the superclass. The

subset symbol on each line connecting a subclass to the circle indicates the direction

of the superclass/subclass relationship.⁵ Attributes that apply only to entities of a

particular subclass—such as TypingSpeed of SECRETARY—are attached to the rec-

tangle representing that subclass. These are called specific attributes (or local

⁴In some object-oriented programming languages, a common restriction is that an entity (or object) has

only one type. This is generally too restrictive for conceptual database modeling.

⁵There are many alternative notations for specialization; we present the UML notation in Section 8.6 and

other proposed notations in Appendix A.

8.2

Specialization and Generalization

249

attributes) of the subclass. Similarly, a subclass can participate in specific relation-

ship types, such as the HOURLY_EMPLOYEE subclass participating in the

BELONGS_TO relationship in Figure 8.1. We will explain the d symbol in the circles

in Figure 8.1 and additional EER diagram notation shortly.

Figure

8.2

shows a few entity instances that belong to subclasses of the

{SECRETARY, ENGINEER, TECHNICIAN} specialization. Again, notice that an entity

that belongs to a subclass represents the same real-world entity as the entity con-

nected to it in the EMPLOYEE superclass, even though the same entity is shown

twice; for example, e₁ is shown in both EMPLOYEE and SECRETARY in Figure 8.2. As

the figure suggests, a superclass/subclass relationship such as EMPLOYEE/

SECRETARY somewhat resembles a 1:1 relationship at the instance level (see Figure

7.12). The main difference is that in a 1:1 relationship two distinct entities are

related, whereas in a superclass/subclass relationship the entity in the subclass is the

same real-world entity as the entity in the superclass but is playing a specialized

role—for example, an EMPLOYEE specialized in the role of SECRETARY, or an

EMPLOYEE specialized in the role of TECHNICIAN.

SECRETARY

Figure 8.2

Instances of a specialization.

e₁

e₄

e₅

EMPLOYEE

e₁

e₂

ENGINEER

e₃

e₄

e₂

e₇

e₅

e₆

e₇

e₈

TECHNICIAN

e₃

e₈

250

Chapter 8 The Enhanced Entity-Relationship (EER) Model

There are two main reasons for including class/subclass relationships and specializa-

tions in a data model. The first is that certain attributes may apply to some but not all

entities of the superclass. A subclass is defined in order to group the entities to which

these attributes apply. The members of the subclass may still share the majority of

their attributes with the other members of the superclass. For example, in Figure 8.1

the SECRETARY subclass has the specific attribute Typing_speed, whereas the

ENGINEER subclass has the specific attribute Eng_type, but SECRETARY and

ENGINEER share their other inherited attributes from the EMPLOYEE entity type.

The second reason for using subclasses is that some relationship types may be par-

ticipated in only by entities that are members of the subclass. For example, if only

HOURLY_EMPLOYEES can belong to a trade union, we can represent that fact by

creating the subclass HOURLY_EMPLOYEE of EMPLOYEE and relating the subclass

to an entity type TRADE_UNION via the BELONGS_TO relationship type, as illus-

trated in Figure 8.1.

In summary, the specialization process allows us to do the following:

■ Define a set of subclasses of an entity type

■ Establish additional specific attributes with each subclass

■ Establish additional specific relationship types between each subclass and

other entity types or other subclasses

8.2.2

Generalization

We can think of a reverse process of abstraction in which we suppress the differences

among several entity types, identify their common features, and generalize them

into a single superclass of which the original entity types are special subclasses. For

example, consider the entity types CAR and TRUCK shown in Figure 8.3(a). Because

they have several common attributes, they can be generalized into the entity type

VEHICLE, as shown in Figure 8.3(b). Both CAR and TRUCK are now subclasses of the

generalized superclass VEHICLE. We use the term generalization to refer to the

process of defining a generalized entity type from the given entity types.

Notice that the generalization process can be viewed as being functionally the inverse

of the specialization process. Hence, in Figure 8.3 we can view {CAR, TRUCK} as a

specialization of VEHICLE, rather than viewing VEHICLE as a generalization of CAR

and TRUCK. Similarly, in Figure 8.1 we can view EMPLOYEE as a generalization of

SECRETARY, TECHNICIAN, and ENGINEER. A diagrammatic notation to distinguish

between generalization and specialization is used in some design methodologies. An

arrow pointing to the generalized superclass represents a generalization, whereas

arrows pointing to the specialized subclasses represent a specialization. We will not

use this notation because the decision as to which process is followed in a particular

situation is often subjective. Appendix A gives some of the suggested alternative dia-

grammatic notations for schema diagrams and class diagrams.

So far we have introduced the concepts of subclasses and superclass/subclass rela-

tionships, as well as the specialization and generalization processes. In general, a

8.3

Constraints and Characteristics of Specialization and Generalization Hierarchies

251

(a)

No_of_passengers

No_of_axles

Max_speed

Tonnage

CAR

Price

TRUCK

Vehicle_id

License_plate_no

(b)

Vehicle_id

Price

License_plate_no

VEHICLE

No_of_passengers

No_of_axles

Max_speed

Tonnage

CAR

TRUCK

Figure 8.3

Generalization. (a) Two entity types, CAR and TRUCK. (b)

Generalizing CAR and TRUCK into the superclass VEHICLE.

superclass or subclass represents a collection of entities of the same type and hence

also describes an entity type; that is why superclasses and subclasses are all shown in

rectangles in EER diagrams, like entity types. Next, we discuss the properties of spe-

cializations and generalizations in more detail.

8.3 Constraints and Characteristics

of Specialization and Generalization

Hierarchies

First, we discuss constraints that apply to a single specialization or a single general-

ization. For brevity, our discussion refers only to specialization even though it

applies to both specialization and generalization. Then, we discuss differences

between specialization/generalization lattices (multiple inheritance) and hierarchies

(single inheritance), and elaborate on the differences between the specialization and

generalization processes during conceptual database schema design.

8.3.1

Constraints on Specialization and Generalization

In general, we may have several specializations defined on the same entity type (or

superclass), as shown in Figure 8.1. In such a case, entities may belong to subclasses

252

Chapter 8 The Enhanced Entity-Relationship (EER) Model

in each of the specializations. However, a specialization may also consist of a single

subclass only, such as the {MANAGER} specialization in Figure 8.1; in such a case, we

do not use the circle notation.

In some specializations we can determine exactly the entities that will become

members of each subclass by placing a condition on the value of some attribute of

the superclass. Such subclasses are called predicate-defined (or condition-defined)

subclasses. For example, if the EMPLOYEE entity type has an attribute Job_type, as

shown in Figure 8.4, we can specify the condition of membership in the

SECRETARY subclass by the condition (Job_type = ‘Secretary’), which we call the

defining predicate of the subclass. This condition is a constraint specifying that

exactly those entities of the EMPLOYEE entity type whose attribute value for

Job_type is ‘Secretary’ belong to the subclass. We display a predicate-defined subclass

by writing the predicate conditioext to the line that connects the subclass to the

specialization circle.

If all subclasses in a specialization have their membership condition on the same

attribute of the superclass, the specialization itself is called an attribute-defined spe-

cialization, and the attribute is called the defining attribute of the specialization.⁶ In

this case, all the entities with the same value for the attribute belong to the same sub-

class. We display an attribute-defined specialization by placing the defining attribute

name next to the arc from the circle to the superclass, as shown in Figure 8.4.

When we do not have a condition for determining membership in a subclass, the

subclass is called user-defined. Membership in such a subclass is determined by the

database users when they apply the operation to add an entity to the subclass; hence,

membership is specified individually for each entity by the user, not by any condition

that may be evaluated automatically.

Figure 8.4

Fname

Minit

Lname

EER diagram notation

for an attribute-defined

Name

Ssn

Birth_date

Address

Job_type

specialization on

Job_type.

EMPLOYEE

Job_type

‘Secretary’

‘Engineer’

Typing_speed

Tgrade

Eng_type

‘Technician’

SECRETARY

TECHNICIAN

ENGINEER

⁶Such an attribute is called a discriminator in UML terminology.

8.3

Constraints and Characteristics of Specialization and Generalization Hierarchies

253

Two other constraints may apply to a specialization. The first is the disjointness (or

disjointedness) constraint, which specifies that the subclasses of the specialization

must be disjoint. This means that an entity can be a member of at most one of the

subclasses of the specialization. A specialization that is attribute-defined implies the

disjointness constraint (if the attribute used to define the membership predicate is

single-valued). Figure 8.4 illustrates this case, where the d in the circle stands for

disjoint. The d notation also applies to user-defined subclasses of a specialization

that must be disjoint, as illustrated by the specialization {HOURLY_EMPLOYEE,

SALARIED_EMPLOYEE} in Figure 8.1. If the subclasses are not constrained to be dis-

joint, their sets of entities may be overlapping; that is, the same (real-world) entity

may be a member of more than one subclass of the specialization. This case, which

is the default, is displayed by placing an o in the circle, as shown in Figure 8.5.

The second constraint on specialization is called the completeness (or totalness)

constraint, which may be total or partial. A total specialization constraint specifies

that every entity in the superclass must be a member of at least one subclass in the

specialization. For example, if every EMPLOYEE must be either an

HOURLY_EMPLOYEE or a SALARIED_EMPLOYEE, then the specialization

{HOURLY_EMPLOYEE, SALARIED_EMPLOYEE} in Figure 8.1 is a total specialization

of EMPLOYEE. This is shown in EER diagrams by using a double line to connect the

superclass to the circle. A single line is used to display a partial specialization,

which allows an entity not to belong to any of the subclasses. For example, if some

EMPLOYEE entities do not belong to any of the subclasses

{SECRETARY,

ENGINEER, TECHNICIAN} in Figures 8.1 and 8.4, then that specialization is partial.⁷

Notice that the disjointness and completeness constraints are independent. Hence,

we have the following four possible constraints on specialization:

■ Disjoint, total

■ Disjoint, partial

■ Overlapping, total

■ Overlapping, partial

Figure 8.5

Part_no

Description

EER diagram notation

for an overlapping

Manufacture_date

PART

(nondisjoint)

specialization.

Batch_no

Supplier_name

Drawing_no

List_price

MANUFACTURED_PART

PURCHASED_PART

⁷The notation of using single or double lines is similar to that for partial or total participation of an entity

type in a relationship type, as described in Chapter 7.

254

Chapter 8 The Enhanced Entity-Relationship (EER) Model

Of course, the correct constraint is determined from the real-world meaning that

applies to each specialization. In general, a superclass that was identified through

the generalization process usually is total, because the superclass is derived from the

subclasses and hence contains only the entities that are in the subclasses.

Certain insertion and deletion rules apply to specialization (and generalization) as a

consequence of the constraints specified earlier. Some of these rules are as follows:

■ Deleting an entity from a superclass implies that it is automatically deleted

from all the subclasses to which it belongs.

■ Inserting an entity in a superclass implies that the entity is mandatorily

inserted in all predicate-defined (or attribute-defined) subclasses for which

the entity satisfies the defining predicate.

■ Inserting an entity in a superclass of a total specialization implies that the

entity is mandatorily inserted in at least one of the subclasses of the special-

ization.

The reader is encouraged to make a complete list of rules for insertions and dele-

tions for the various types of specializations.

8.3.2

Specialization and Generalization Hierarchies

and Lattices

A subclass itself may have further subclasses specified on it, forming a hierarchy or a

lattice of specializations. For example, in Figure 8.6 ENGINEER is a subclass of

EMPLOYEE and is also a superclass of ENGINEERING_MANAGER; this represents

the real-world constraint that every engineering manager is required to be an engi-

neer. A specialization hierarchy has the constraint that every subclass participates

as a subclass in only one class/subclass relationship; that is, each subclass has only

Figure 8.6

A specialization lattice with shared subclass

ENGINEERING_MANAGER.

EMPLOYEE

SECRETARY

TECHNICIAN

ENGINEER

MANAGER

HOURLY_EMPLOYEE

SALARIED_EMPLOYEE

ENGINEERING_MANAGER

8.3

Constraints and Characteristics of Specialization and Generalization Hierarchies

255

one parent, which results in a tree structure or strict hierarchy. In contrast, for a

specialization lattice, a subclass can be a subclass in more than one class/subclass

relationship. Hence, Figure 8.6 is a lattice.

Figure 8.7 shows another specialization lattice of more than one level. This may be

part of a conceptual schema for a UNIVERSITY database. Notice that this arrange-

ment would have been a hierarchy except for the STUDENT_ASSISTANT subclass,

which is a subclass in two distinct class/subclass relationships.

The requirements for the part of the UNIVERSITY database shown in Figure 8.7 are

the following:

1. The database keeps track of three types of persons: employees, alumni, and

students. A person can belong to one, two, or all three of these types. Each

person has a name, SSN, sex, address, and birth date.

2. Every employee has a salary, and there are three types of employees: faculty,

staff, and student assistants. Each employee belongs to exactly one of these

types. For each alumnus, a record of the degree or degrees that he or she

Figure 8.7

Name

Sex

Address

A specialization lattice

with multiple inheritance

Ssn

PERSON

Birth_date

for a UNIVERSITY

database.

Salary

Major_dept

EMPLOYEE

ALUMNUS

STUDENT

Degrees

ear

Degree

Major

Percent_time

STAFF

FACULTY

STUDENT_

GRADUATE_

UNDERGRADUATE_

ASSISTANT

STUDENT

Position

Rank

Degree_program

Class

Project

Course

RESEARCH_ASSISTANT

TEACHING_ASSISTANT

256

Chapter 8 The Enhanced Entity-Relationship (EER) Model

earned at the university is kept, including the name of the degree, the year

granted, and the major department. Each student has a major department.

3. Each faculty has a rank, whereas each staff member has a staff position.

Student assistants are classified further as either research assistants or teach-

ing assistants, and the percent of time that they work is recorded in the data-

base. Research assistants have their research project stored, whereas teaching

assistants have the current course they work on.

4. Students are further classified as either graduate or undergraduate, with the

specific attributes degree program (M.S., Ph.D., M.B.A., and so on) for

graduate students and class (freshman, sophomore, and so on) for under-

graduates.

In Figure 8.7, all person entities represented in the database are members of the

PERSON entity type, which is specialized into the subclasses

{EMPLOYEE,

ALUMNUS, STUDENT}. This specialization is overlapping; for example, an alumnus

may also be an employee and may also be a student pursuing an advanced degree.

The subclass STUDENT is the superclass for the specialization

{GRADUATE_STUDENT, UNDERGRADUATE_STUDENT}, while EMPLOYEE is the

superclass for the specialization {STUDENT_ASSISTANT, FACULTY, STAFF}. Notice

that STUDENT_ASSISTANT is also a subclass of STUDENT. Finally,

STUDENT_ASSISTANT is the superclass for the specialization into

{RESEARCH_ASSISTANT, TEACHING_ASSISTANT}.

In such a specialization lattice or hierarchy, a subclass inherits the attributes not

only of its direct superclass, but also of all its predecessor superclasses all the way to

the root of the hierarchy or lattice if necessary. For example, an entity in

GRADUATE_STUDENT inherits all the attributes of that entity as a STUDENT and as

a PERSON. Notice that an entity may exist in several leaf nodes of the hierarchy,

where a leaf node is a class that has no subclasses of its own. For example, a member

of GRADUATE_STUDENT may also be a member of RESEARCH_ASSISTANT.

A subclass with more than one superclass is called a shared subclass, such as

ENGINEERING_MANAGER in Figure 8.6. This leads to the concept known as

multiple inheritance, where the shared subclass ENGINEERING_MANAGER directly

inherits attributes and relationships from multiple classes. Notice that the existence

of at least one shared subclass leads to a lattice (and hence to multiple inheritance);

if no shared subclasses existed, we would have a hierarchy rather than a lattice and

only single inheritance would exist. An important rule related to multiple inheri-

tance can be illustrated by the example of the shared subclass STUDENT_ASSISTANT

in Figure 8.7, which inherits attributes from both EMPLOYEE and STUDENT. Here,

both EMPLOYEE and STUDENT inherit the same attributes from PERSON. The rule

states that if an attribute (or relationship) originating in the same superclass

(PERSON) is inherited more than once via different paths (EMPLOYEE and

STUDENT) in the lattice, then it should be included only once in the shared subclass

(STUDENT_ASSISTANT). Hence, the attributes of PERSON are inherited only once in

the STUDENT_ASSISTANT subclass in Figure 8.7.

8.3

Constraints and Characteristics of Specialization and Generalization Hierarchies

257

It is important to note here that some models and languages are limited to single

inheritance and do not allow multiple inheritance (shared subclasses). It is also

important to note that some models do not allow an entity to have multiple types,

and hence an entity can be a member of only one leaf class.⁸ In such a model, it is

necessary to create additional subclasses as leaf nodes to cover all possible combina-

tions of classes that may have some entity that belongs to all these classes simultane-

ously. For example, in the overlapping specialization of PERSON into {EMPLOYEE,

ALUMNUS, STUDENT} (or {E, A, S} for short), it would be necessary to create seven

subclasses of PERSON in order to cover all possible types of entities: E, A, S, E_A,

E_S, A_S, and E_A_S. Obviously, this can lead to extra complexity.

Although we have used specialization to illustrate our discussion, similar concepts

apply equally to generalization, as we mentioned at the beginning of this section.

Hence, we can also speak of generalization hierarchies and generalization lattices.

8.3.3

Utilizing Specialization and Generalization

in Refining Conceptual Schemas

Now we elaborate on the differences between the specialization and generalization

processes, and how they are used to refine conceptual schemas during conceptual

database design. In the specialization process, we typically start with an entity type

and then define subclasses of the entity type by successive specialization; that is, we

repeatedly define more specific groupings of the entity type. For example, when

designing the specialization lattice in Figure 8.7, we may first specify an entity type

PERSON for a university database. Then we discover that three types of persons will

be represented in the database: university employees, alumni, and students. We cre-

ate the specialization {EMPLOYEE, ALUMNUS, STUDENT} for this purpose and

choose the overlapping constraint, because a person may belong to more than one

of the subclasses. We specialize EMPLOYEE further into

{STAFF, FACULTY,

STUDENT_ASSISTANT}, and specialize STUDENT into {GRADUATE_STUDENT,

UNDERGRADUATE_STUDENT}. Finally, we specialize STUDENT_ASSISTANT into

{RESEARCH_ASSISTANT, TEACHING_ASSISTANT}. This successive specialization

corresponds to a top-down conceptual refinement process during conceptual

schema design. So far, we have a hierarchy; then we realize that

STUDENT_ASSISTANT is a shared subclass, since it is also a subclass of STUDENT,

leading to the lattice.

It is possible to arrive at the same hierarchy or lattice from the other direction. In

such a case, the process involves generalization rather than specialization and corre-

sponds to a bottom-up conceptual synthesis. For example, the database designers

may first discover entity types such as STAFF, FACULTY, ALUMNUS,

GRADUATE_STUDENT, UNDERGRADUATE_STUDENT, RESEARCH_ASSISTANT,

TEACHING_ASSISTANT, and so on; then they generalize {GRADUATE_STUDENT,

⁸In some models, the class is further restricted to be a leaf node in the hierarchy or lattice.

258

Chapter 8 The Enhanced Entity-Relationship (EER) Model

UNDERGRADUATE_STUDENT} into STUDENT; then they generalize

{RESEARCH_ASSISTANT, TEACHING_ASSISTANT} into STUDENT_ASSISTANT; then

they generalize {STAFF, FACULTY, STUDENT_ASSISTANT} into EMPLOYEE; and

finally they generalize {EMPLOYEE, ALUMNUS, STUDENT} into PERSON.

In structural terms, hierarchies or lattices resulting from either process may be iden-

tical; the only difference relates to the manner or order in which the schema super-

classes and subclasses were created during the design process. In practice, it is likely

that neither the generalization process nor the specialization process is followed

strictly, but that a combination of the two processes is employed. New classes are

continually incorporated into a hierarchy or lattice as they become apparent to users

and designers. Notice that the notion of representing data and knowledge by using

superclass/subclass hierarchies and lattices is quite common in knowledge-based sys-

tems and expert systems, which combine database technology with artificial intelli-

gence techniques. For example, frame-based knowledge representation schemes

closely resemble class hierarchies. Specialization is also common in software engi-

neering design methodologies that are based on the object-oriented paradigm.

8.4 Modeling of UNION Types Using Categories

All of the superclass/subclass relationships we have seen thus far have a single super-

class. A shared subclass such as ENGINEERING_MANAGER in the lattice in Figure

8.6 is the subclass in three distinct superclass/subclass relationships, where each of

the three relationships has a single superclass. However, it is sometimes necessary to

represent a single superclass/subclass relationship with more than one superclass,

where the superclasses represent different entity types. In this case, the subclass will

represent a collection of objects that is a subset of the UNION of distinct entity types;

we call such a subclass a union type or a category.⁹

For example, suppose that we have three entity types: PERSON, BANK, and

COMPANY.In a database for motor vehicle registration, an owner of a vehicle can be

a person, a bank (holding a lien on a vehicle), or a company. We need to create a

class (collection of entities) that includes entities of all three types to play the role of

vehicle owner. A category (union type) OWNER that is a subclass of the UNION of the

three entity sets of COMPANY, BANK, and PERSON can be created for this purpose.

We display categories in an EER diagram as shown in Figure 8.8. The superclasses

COMPANY, BANK, and PERSON are connected to the circle with the ∪ symbol,

which stands for the set union operation. An arc with the subset symbol connects the

circle to the (subclass) OWNER category. If a defining predicate is needed, it is dis-

played next to the line from the superclass to which the predicate applies. In Figure

8.8 we have two categories: OWNER, which is a subclass of the union of PERSON,

BANK, and COMPANY; and REGISTERED_VEHICLE, which is a subclass of the union

of CAR and TRUCK.

⁹Our use of the term category is based on the ECR (Entity-Category-Relationship) model (Elmasri et al.

1985).

8.4

Modeling of UNION Types Using Categories

259

Bname

Baddress

BANK

Driver_license_no

Name

Address

Cname

Caddress

Ssn

PERSON

COMPANY

OWNER

Lien_or_regular

OWNS

Purchase_date

License_plate_no

REGISTERED_VEHICLE

Vehicle_id

Cstyle

Tonnage

Cmake

CAR

TRUCK

Tmake

Figure 8.8

Cyear

Tyear

Two categories (union

types): OWNER and

Cmodel

Tmodel

REGISTERED_VEHICLE.

A category has two or more superclasses that may represent distinct entity types,

whereas other superclass/subclass relationships always have a single superclass. To

better understand the difference, we can compare a category, such as OWNER in

Figure 8.8, with the ENGINEERING_MANAGER shared subclass in Figure 8.6. The

latter is a subclass of each of the three superclasses ENGINEER, MANAGER, and

SALARIED_EMPLOYEE, so an entity that is a member of ENGINEERING_MANAGER

must exist in all three. This represents the constraint that an engineering manager

must be an ENGINEER, a MANAGER, and a SALARIED_EMPLOYEE; that is,

ENGINEERING_MANAGER is a subset of the intersection of the three classes (sets of

entities). On the other hand, a category is a subset of the union of its superclasses.

Hence, an entity that is a member of OWNER must exist in only one of the super-

260

Chapter 8 The Enhanced Entity-Relationship (EER) Model

classes. This represents the constraint that an OWNER may be a COMPANY, a BANK,

or a PERSON in Figure 8.8.

Attribute inheritance works more selectively in the case of categories. For example,

in Figure 8.8 each OWNER entity inherits the attributes of a COMPANY, a PERSON,

or a BANK, depending on the superclass to which the entity belongs. On the other

hand, a shared subclass such as ENGINEERING_MANAGER (Figure 8.6) inherits all

the attributes of its superclasses SALARIED_EMPLOYEE, ENGINEER, and

MANAGER.

It is interesting to note the difference between the category REGISTERED_VEHICLE

(Figure 8.8) and the generalized superclass VEHICLE (Figure 8.3(b)). In Figure

8.3(b), every car and every truck is a VEHICLE; but in Figure

8.8, the

REGISTERED_VEHICLE category includes some cars and some trucks but not neces-

sarily all of them (for example, some cars or trucks may not be registered). In gen-

eral, a specialization or generalization such as that in Figure 8.3(b), if it were partial,

would not preclude VEHICLE from containing other types of entities, such as

motorcycles. However, a category such as REGISTERED_VEHICLE in Figure 8.8

implies that only cars and trucks, but not other types of entities, can be members of

REGISTERED_VEHICLE.

A category can be total or partial. A total category holds the union of all entities in

its superclasses, whereas a partial category can hold a subset of the union. A total cat-

egory is represented diagrammatically by a double line connecting the category and

the circle, whereas a partial category is indicated by a single line.

The superclasses of a category may have different key attributes, as demonstrated by

the OWNER category in Figure 8.8, or they may have the same key attribute, as

demonstrated by the REGISTERED_VEHICLE category. Notice that if a category is

total (not partial), it may be represented alternatively as a total specialization (or a

total generalization). In this case, the choice of which representation to use is sub-

jective. If the two classes represent the same type of entities and share numerous

attributes, including the same key attributes, specialization/generalization is pre-

ferred; otherwise, categorization (union type) is more appropriate.

It is important to note that some modeling methodologies do not have union types.

In these models, a union type must be represented in a roundabout way (see Section

9.2).

8.5 A Sample UNIVERSITY EER Schema,

Design Choices, and Formal Definitions

In this section, we first give an example of a database schema in the EER model to

illustrate the use of the various concepts discussed here and in Chapter 7. Then, we

discuss design choices for conceptual schemas, and finally we summarize the EER

model concepts and define them formally in the same manner in which we formally

defined the concepts of the basic ER model in Chapter 7.

8.5

A Sample UNIVERSITY EER Schema, Design Choices, and Formal Definitions

261

8.5.1

The UNIVERSITY Database Example

For our sample database application, consider a UNIVERSITY database that keeps

track of students and their majors, transcripts, and registration as well as of the uni-

versity’s course offerings. The database also keeps track of the sponsored research

projects of faculty and graduate students. This schema is shown in Figure 8.9. A dis-

cussion of the requirements that led to this schema follows.

For each person, the database maintains information on the person’s Name [Name],

Social Security number [Ssn], address [Address], sex [Sex], and birth date [Bdate].

Two subclasses of the PERSON entity type are identified: FACULTY and STUDENT.

Specific attributes of FACULTY are rank [Rank] (assistant, associate, adjunct,

research, visiting, and so on), office [Foffice], office phone [Fphone], and salary

[Salary]. All faculty members are related to the academic department(s) with which

they are affiliated [BELONGS] (a faculty member can be associated with several

departments, so the relationship is M:N). A specific attribute of STUDENT is [Class]

(freshman=1, sophomore=2, …, graduate student=5). Each STUDENT is also related

to his or her major and minor departments (if known) [MAJOR] and [MINOR], to

the course sections he or she is currently attending [REGISTERED], and to the

courses completed [TRANSCRIPT]. Each TRANSCRIPT instance includes the grade

the student received [Grade] in a section of a course.

GRAD_STUDENT is a subclass of STUDENT, with the defining predicate Class = 5.

For each graduate student, we keep a list of previous degrees in a composite, multi-

valued attribute [Degrees]. We also relate the graduate student to a faculty advisor

[ADVISOR] and to a thesis committee [COMMITTEE], if one exists.

An academic department has the attributes name [Dname], telephone [Dphone], and

office number [Office] and is related to the faculty member who is its chairperson

[CHAIRS] and to the college to which it belongs [CD]. Each college has attributes

college name [Cname], office number [Coffice], and the name of its dean [Dean].

A course has attributes course number [C#], course name [Cname], and course

description [Cdesc]. Several sections of each course are offered, with each section

having the attributes sectioumber [Sec#] and the year and quarter in which the

section was offered ([Year] and [Qtr]).¹⁰ Sectioumbers uniquely identify each sec-

tion. The sections being offered during the current quarter are in a subclass

CURRENT_SECTION of SECTION, with the defining predicate Qtr = Current_qtr and

Year = Current_year. Each section is related to the instructor who taught or is teach-

ing it ([TEACH]), if that instructor is in the database.

The category INSTRUCTOR_RESEARCHER is a subset of the union of FACULTY and

GRAD_STUDENT and includes all faculty, as well as graduate students who are sup-

ported by teaching or research. Finally, the entity type GRANT keeps track of

research grants and contracts awarded to the university. Each grant has attributes

grant title [Title], grant number [No], the awarding agency [Agency], and the starting

¹⁰We assume that the quarter system rather than the semester system is used in this university.

262

Chapter 8 The Enhanced Entity-Relationship (EER) Model

Fname Minit

Lname

Ssn

Bdate

Sex

No Street Apt_no City State

Zip

Name

PERSON

Address

Fphone Salary

Class

Foffice

College

Degree Year

Rank

FACULTY

ADVISOR

STUDENT

Degrees

Class=5

COMMITTEE

GRAD_STUDENT

Title

GRANT

Agency

St_date

MINOR

Start

MAJOR

SUPPORT

Time

End

Grade

BELONGS

INSTRUCTOR_RESEARCHER

REGISTERED

CHAIRS

TRANSCRIPT

TEACH

CURRENT_SECTION

Qtr = Current_qtr and

Year = Current_year

SECTION

DEPARTMENT

Qtr

Sec# Year

Dname

Office

Dphone

COLLEGE

COURSE

Figure 8.9

Cname

Coffice

Cdesc

An EER conceptual schema

Dean

Cname

for a UNIVERSITY database.

8.5

A Sample UNIVERSITY EER Schema, Design Choices, and Formal Definitions

263

date [St_date]. A grant is related to one principal investigator [PI] and to all

researchers it supports [SUPPORT]. Each instance of support has as attributes the

starting date of support [Start], the ending date of the support (if known) [End],

and the percentage of time being spent on the project [Time] by the researcher being

supported.

8.5.2

Design Choices for Specialization/Generalization

It is not always easy to choose the most appropriate conceptual design for a database

application. In Section 7.7.3, we presented some of the typical issues that confront a

database designer when choosing among the concepts of entity types, relationship

types, and attributes to represent a particular miniworld situation as an ER schema.

In this section, we discuss design guidelines and choices for the EER concepts of

specialization/generalization and categories (union types).

As we mentioned in Section 7.7.3, conceptual database design should be considered

as an iterative refinement process until the most suitable design is reached. The fol-

lowing guidelines can help to guide the design process for EER concepts:

■

In general, many specializations and subclasses can be defined to make the

conceptual model accurate. However, the drawback is that the design

becomes quite cluttered. It is important to represent only those subclasses

that are deemed necessary to avoid extreme cluttering of the conceptual

schema.

■

If a subclass has few specific (local) attributes and no specific relationships, it

can be merged into the superclass. The specific attributes would hold NULL

values for entities that are not members of the subclass. A type attribute

could specify whether an entity is a member of the subclass.

■

Similarly, if all the subclasses of a specialization/generalization have few spe-

cific attributes and no specific relationships, they can be merged into the

superclass and replaced with one or more type attributes that specify the sub-

class or subclasses that each entity belongs to (see Section 9.2 for how this

criterion applies to relational databases).

■

Union types and categories should generally be avoided unless the situation

definitely warrants this type of construct, which does occur in some practi-

cal situations. If possible, we try to model using specialization/generalization

as discussed at the end of Section 8.4.

■

The choice of disjoint/overlapping and total/partial constraints on special-

ization/generalization is driven by the rules in the miniworld being modeled.

If the requirements do not indicate any particular constraints, the default

would generally be overlapping and partial, since this does not specify any

restrictions on subclass membership.

264

Chapter 8 The Enhanced Entity-Relationship (EER) Model

As an example of applying these guidelines, consider Figure 8.6, where no specific

(local) attributes are shown. We could merge all the subclasses into the EMPLOYEE

entity type, and add the following attributes to EMPLOYEE:

■ An attribute Job_type whose value set {‘Secretary’, ‘Engineer’, ‘Technician’}

would indicate which subclass in the first specialization each employee

belongs to.

■ An attribute Pay_method whose value set {‘Salaried’, ‘Hourly’} would indicate

which subclass in the second specialization each employee belongs to.

■ An attribute Is_a_manager whose value set {‘Yes’, ‘No’} would indicate

whether an individual employee entity is a manager or not.

8.5.3

Formal Definitions for the EER Model Concepts

We now summarize the EER model concepts and give formal definitions. A class¹¹

is a set or collection of entities; this includes any of the EER schema constructs of

group entities, such as entity types, subclasses, superclasses, and categories. A

subclass S is a class whose entities must always be a subset of the entities in another

class, called the superclass C of the superclass/subclass (or IS-A) relationship. We

denote such a relationship by C/S. For such a superclass/subclass relationship, we

must always have

S⊆C

A specialization Z = {S₁, S₂, …, S_n} is a set of subclasses that have the same super-

class G; that is, G/S_i is a superclass/subclass relationship for i = 1, 2, …, n. G is called

a generalized entity type

(or the superclass of the specialization, or a

generalization of the subclasses {S₁, S₂, …, S_n} ). Z is said to be total if we always (at

any point in time) have

∪S_i =G

i=1

Otherwise, Z is said to be partial. Z is said to be disjoint if we always have

S_i ∩ S_j = ∅ (empty set) for i ≠ j

Otherwise, Z is said to be overlapping.

A subclass S of C is said to be predicate-defined if a predicate p on the attributes of

C is used to specify which entities in C are members of S; that is, S = C[p], where

C[p] is the set of entities in C that satisfy p. A subclass that is not defined by a pred-

icate is called user-defined.

A specialization Z (or generalization G) is said to be attribute-defined if a predicate

(A = c_i), where A is an attribute of G and c_i is a constant value from the domain of A,

¹¹The use of the word class here differs from its more common use in object-oriented programming lan-

guages such as C++. In C++, a class is a structured type definition along with its applicable functions

(operations).

8.6

Example of Other Notation: Representing Specialization and Generalization in UML Class Diagrams

265

is used to specify membership in each subclass S_i in Z. Notice that if c_i ≠ c_j for i ≠ j,

and A is a single-valued attribute, then the specialization will be disjoint.

A category T is a class that is a subset of the union of n defining superclasses D₁, D₂,

…, D_n, n > 1, and is formally specified as follows:

T ⊆ (D₁ ∪ D₂ … ∪ D_n)

A predicate p_i on the attributes of D_i can be used to specify the members of each D_i

that are members of T. If a predicate is specified on every D_i, we get

T = (D₁[p₁] ∪ D₂[p₂] … ∪ D_n[p_n])

We should now extend the definition of relationship type given in Chapter 7 by

allowing any class—not only any entity type—to participate in a relationship.

Hence, we should replace the words entity type with class in that definition. The

graphical notation of EER is consistent with ER because all classes are represented

by rectangles.

8.6 Example of Other Notation: Representing

Specialization and Generalization in UML

Class Diagrams

We now discuss the UML notation for generalization/specialization and inheri-

tance. We already presented basic UML class diagram notation and terminology in

Section 7.8. Figure 8.10 illustrates a possible UML class diagram corresponding to

the EER diagram in Figure 8.7. The basic notation for specialization/generalization

(see Figure 8.10) is to connect the subclasses by vertical lines to a horizontal line,

which has a triangle connecting the horizontal line through another vertical line to

the superclass. A blank triangle indicates a specialization/generalization with the

disjoint constraint, and a filled triangle indicates an overlapping constraint. The root

superclass is called the base class, and the subclasses (leaf nodes) are called leaf

classes.

The above discussion and example in Figure 8.10, and the presentation in Section

7.8 gave a brief overview of UML class diagrams and terminology. We focused on

the concepts that are relevant to ER and EER database modeling, rather than those

concepts that are more relevant to software engineering. In UML, there are many

details that we have not discussed because they are outside the scope of this book

and are mainly relevant to software engineering. For example, classes can be of var-

ious types:

■ Abstract classes define attributes and operations but do not have objects cor-

responding to those classes. These are mainly used to specify a set of attrib-

utes and operations that can be inherited.

■ Concrete classes can have objects (entities) instantiated to belong to the

class.

■ Template classes specify a template that can be further used to define other

classes.

266

Chapter 8 The Enhanced Entity-Relationship (EER) Model

PERSON

Name

Ssn

Birth_date

Sex

Address

age

EMPLOYEE

ALUMNUS

DEGREE

STUDENT

Salary

Year

Major_dept

Degree

hire_emp

new_alumnus

change_major

Major

STAFF

FACULTY

STUDENT_ASSISTANT

Position

Rank

Percent_time

hire_staff

promote

hire_student

RESEARCH_

TEACHING_

GRADUATE_

UNDERGRADUATE_

ASSISTANT

STUDENT

Project

Course

Degree_program

Class

change_project

assign_to_course

change_degree_program

change_classification

Figure 8.10

A UML class diagram corresponding to the EER diagram in Figure 8.7,

illustrating UML notation for specialization/generalization.

In database design, we are mainly concerned with specifying concrete classes whose

collections of objects are permanently (or persistently) stored in the database. The

bibliographic notes at the end of this chapter give some references to books that

describe complete details of UML. Additional material related to UML is covered in

Chapter 10.

8.7

Data Abstraction, Knowledge Representation, and Ontology Concepts

267

8.7 Data Abstraction, Knowledge

Representation, and Ontology Concepts

In this section we discuss in general terms some of the modeling concepts that we

described quite specifically in our presentation of the ER and EER models in

Chapter 7 and earlier in this chapter. This terminology is not only used in concep-

tual data modeling but also in artificial intelligence literature when discussing

knowledge representation (KR). This section discusses the similarities and differ-

ences between conceptual modeling and knowledge representation, and introduces

some of the alternative terminology and a few additional concepts.

The goal of KR techniques is to develop concepts for accurately modeling some

domain of knowledge by creating an ontology¹² that describes the concepts of the

domain and how these concepts are interrelated. Such an ontology is used to store

and manipulate knowledge for drawing inferences, making decisions, or answering

questions. The goals of KR are similar to those of semantic data models, but there

are some important similarities and differences between the two disciplines:

■

Both disciplines use an abstraction process to identify common properties

and important aspects of objects in the miniworld (also known as domain of

discourse in KR) while suppressing insignificant differences and unimpor-

tant details.

■

Both disciplines provide concepts, relationships, constraints, operations, and

languages for defining data and representing knowledge.

■

KR is generally broader in scope than semantic data models. Different forms

of knowledge, such as rules (used in inference, deduction, and search),

incomplete and default knowledge, and temporal and spatial knowledge, are

represented in KR schemes. Database models are being expanded to include

some of these concepts (see Chapter 26).

■

KR schemes include reasoning mechanisms that deduce additional facts

from the facts stored in a database. Hence, whereas most current database

systems are limited to answering direct queries, knowledge-based systems

using KR schemes can answer queries that involve inferences over the stored

data. Database technology is being extended with inference mechanisms (see

Section 26.5).

■

Whereas most data models concentrate on the representation of database

schemas, or meta-knowledge, KR schemes often mix up the schemas with

the instances themselves in order to provide flexibility in representing excep-

tions. This often results in inefficiencies when these KR schemes are imple-

mented, especially when compared with databases and when a large amount

of data (facts) needs to be stored.

¹²An ontology is somewhat similar to a conceptual schema, but with more knowledge, rules, and excep-

tions.

268

Chapter 8 The Enhanced Entity-Relationship (EER) Model

We now discuss four abstraction concepts that are used in semantic data models,

such as the EER model as well as in KR schemes: (1) classification and instantiation,

(2) identification, (3) specialization and generalization, and (4) aggregation and

association. The paired concepts of classification and instantiation are inverses of

one another, as are generalization and specialization. The concepts of aggregation

and association are also related. We discuss these abstract concepts and their rela-

tion to the concrete representations used in the EER model to clarify the data

abstraction process and to improve our understanding of the related process of con-

ceptual schema design. We close the section with a brief discussion of ontology,

which is being used widely in recent knowledge representation research.

8.7.1

Classification and Instantiation

The process of classification involves systematically assigning similar objects/enti-

ties to object classes/entity types. We caow describe (in DB) or reason about (in

KR) the classes rather than the individual objects. Collections of objects that share

the same types of attributes, relationships, and constraints are classified into classes

in order to simplify the process of discovering their properties. Instantiation is the

inverse of classification and refers to the generation and specific examination of dis-

tinct objects of a class. An object instance is related to its object class by the IS-AN-

INSTANCE-OF or IS-A-MEMBER-OF relationship. Although EER diagrams do

not display instances, the UML diagrams allow a form of instantiation by permit-

ting the display of individual objects. We did not describe this feature in our intro-

duction to UML class diagrams.

In general, the objects of a class should have a similar type structure. However, some

objects may display properties that differ in some respects from the other objects of

the class; these exception objects also need to be modeled, and KR schemes allow

more varied exceptions than do database models. In addition, certain properties

apply to the class as a whole and not to the individual objects; KR schemes allow

such class properties. UML diagrams also allow specification of class properties.

In the EER model, entities are classified into entity types according to their basic

attributes and relationships. Entities are further classified into subclasses and cate-

gories based on additional similarities and differences (exceptions) among them.

Relationship instances are classified into relationship types. Hence, entity types,

subclasses, categories, and relationship types are the different concepts that are used

for classification in the EER model. The EER model does not provide explicitly for

class properties, but it may be extended to do so. In UML, objects are classified into

classes, and it is possible to display both class properties and individual objects.

Knowledge representation models allow multiple classification schemes in which

one class is an instance of another class (called a meta-class). Notice that this cannot

be represented directly in the EER model, because we have only two levels—classes

and instances. The only relationship among classes in the EER model is a super-

class/subclass relationship, whereas in some KR schemes an additional

class/instance relationship can be represented directly in a class hierarchy. An

instance may itself be another class, allowing multiple-level classification schemes.

8.7

Data Abstraction, Knowledge Representation, and Ontology Concepts

269

8.7.2

Identification

Identification is the abstraction process whereby classes and objects are made

uniquely identifiable by means of some identifier. For example, a class name

uniquely identifies a whole class within a schema. An additional mechanism is nec-

essary for telling distinct object instances apart by means of object identifiers.

Moreover, it is necessary to identify multiple manifestations in the database of the

same real-world object. For example, we may have a tuple <‘Matthew Clarke’,

‘610618’, ‘376-9821’> in a PERSON relation and another tuple <‘301-54-0836’, ‘CS’,

3.8> in a STUDENT relation that happen to represent the same real-world entity.

There is no way to identify the fact that these two database objects (tuples) represent

the same real-world entity unless we make a provision at design time for appropriate

cross-referencing to supply this identification. Hence, identification is needed at

two levels:

■ To distinguish among database objects and classes

■ To identify database objects and to relate them to their real-world counter-

parts

In the EER model, identification of schema constructs is based on a system of

unique names for the constructs in a schema. For example, every class in an EER

schema—whether it is an entity type, a subclass, a category, or a relationship type—

must have a distinct name. The names of attributes of a particular class must also be

distinct. Rules for unambiguously identifying attribute name references in a special-

ization or generalization lattice or hierarchy are needed as well.

At the object level, the values of key attributes are used to distinguish among entities

of a particular entity type. For weak entity types, entities are identified by a combi-

nation of their own partial key values and the entities they are related to in the

owner entity type(s). Relationship instances are identified by some combination of

the entities that they relate to, depending on the cardinality ratio specified.

8.7.3

Specialization and Generalization

Specialization is the process of classifying a class of objects into more specialized

subclasses. Generalization is the inverse process of generalizing several classes into

a higher-level abstract class that includes the objects in all these classes.

Specialization is conceptual refinement, whereas generalization is conceptual syn-

thesis. Subclasses are used in the EER model to represent specialization and general-

ization. We call the relationship between a subclass and its superclass an

IS-A-SUBCLASS-OF relationship, or simply an IS-A relationship. This is the same

as the IS-A relationship discussed earlier in Section 8.5.3.

8.7.4

Aggregation and Association

Aggregation is an abstraction concept for building composite objects from their

component objects. There are three cases where this concept can be related to the

EER model. The first case is the situation in which we aggregate attribute values of

270

Chapter 8 The Enhanced Entity-Relationship (EER) Model

an object to form the whole object. The second case is when we represent an aggre-

gation relationship as an ordinary relationship. The third case, which the EER

model does not provide for explicitly, involves the possibility of combining objects

that are related by a particular relationship instance into a higher-level aggregate

object. This is sometimes useful when the higher-level aggregate object is itself to be

related to another object. We call the relationship between the primitive objects and

their aggregate object IS-A-PART-OF; the inverse is called IS-A-COMPONENT-

OF. UML provides for all three types of aggregation.

The abstraction of association is used to associate objects from several independent

classes. Hence, it is somewhat similar to the second use of aggregation. It is repre-

sented in the EER model by relationship types, and in UML by associations. This

abstract relationship is called IS-ASSOCIATED-WITH.

In order to understand the different uses of aggregation better, consider the ER

schema shown in Figure 8.11(a), which stores information about interviews by job

applicants to various companies. The class COMPANY is an aggregation of the

attributes (or component objects) Cname (company name) and Caddress (company

address), whereas JOB_APPLICANT is an aggregate of Ssn, Name, Address, and Phone.

The relationship attributes Contact_name and Contact_phone represent the name

and phone number of the person in the company who is responsible for the inter-

view. Suppose that some interviews result in job offers, whereas others do not. We

would like to treat INTERVIEW as a class to associate it with JOB_OFFER. The

schema shown in Figure 8.11(b) is incorrect because it requires each interview rela-

tionship instance to have a job offer. The schema shown in Figure 8.11(c) is not

allowed because the ER model does not allow relationships among relationships.

One way to represent this situation is to create a higher-level aggregate class com-

posed of COMPANY, JOB_APPLICANT, and INTERVIEW and to relate this class to

JOB_OFFER, as shown in Figure 8.11(d). Although the EER model as described in

this book does not have this facility, some semantic data models do allow it and call

the resulting object a composite or molecular object. Other models treat entity

types and relationship types uniformly and hence permit relationships among rela-

tionships, as illustrated in Figure 8.11(c).

To represent this situation correctly in the ER model as described here, we need to

create a new weak entity type INTERVIEW, as shown in Figure 8.11(e), and relate it to

JOB_OFFER. Hence, we can always represent these situations correctly in the ER

model by creating additional entity types, although it may be conceptually more

desirable to allow direct representation of aggregation, as in Figure 8.11(d), or to

allow relationships among relationships, as in Figure 8.11(c).

The main structural distinction between aggregation and association is that when

an association instance is deleted, the participating objects may continue to exist.

However, if we support the notion of an aggregate object—for example, a CAR that

is made up of objects ENGINE, CHASSIS, and TIRES—then deleting the aggregate

CAR object amounts to deleting all its component objects.

8.7

Data Abstraction, Knowledge Representation, and Ontology Concepts

271

Figure 8.11

(a)

Contact_name Contact_phone

Aggregation. (a) The relation-

ship type INTERVIEW. (b)

Date

Cname Caddress

Name Ssn Phone Address

Including JOB_OFFER in a

ternary relationship type

(incorrect). (c) Having the

COMPANY

INTERVIEW

JOB_APPLICANT

RESULTS_IN relationship par-

ticipate in other relationships

(not allowed in ER). (d) Using

aggregation and a composite

(b)

(molecular) object (generally

COMPANY

INTERVIEW

JOB_APPLICANT

not allowed in ER but allowed

by some modeling tools). (e)

Correct representation in ER.

JOB_OFFER

(c)

COMPANY

INTERVIEW

JOB_APPLICANT

RESULTS_IN

JOB_OFFER

(d)

COMPANY

INTERVIEW

JOB_APPLICANT

RESULTS_IN

JOB_OFFER

(e)

Cname Caddress

Name Ssn Phone Address

COMPANY

CJI

JOB_APPLICANT

Contact_phone

Contact_name

Date

INTERVIEW

RESULTS_IN

JOB_OFFER

272

Chapter 8 The Enhanced Entity-Relationship (EER) Model

8.7.5

Ontologies and the Semantic Web

In recent years, the amount of computerized data and information available on the

Web has spiraled out of control. Many different models and formats are used. In

addition to the database models that we present in this book, much information is

stored in the form of documents, which have considerably less structure than data-

base information does. One ongoing project that is attempting to allow information

exchange among computers on the Web is called the Semantic Web, which attempts

to create knowledge representation models that are quite general in order to allow

meaningful information exchange and search among machines. The concept of

ontology is considered to be the most promising basis for achieving the goals of the

Semantic Web and is closely related to knowledge representation. In this section, we

give a brief introduction to what ontology is and how it can be used as a basis to

automate information understanding, search, and exchange.

The study of ontologies attempts to describe the structures and relationships that

are possible in reality through some common vocabulary; therefore, it can be con-

sidered as a way to describe the knowledge of a certain community about reality.

Ontology originated in the fields of philosophy and metaphysics. One commonly

used definition of ontology is a specification of a conceptualization.¹³

In this definition, a conceptualization is the set of concepts that are used to repre-

sent the part of reality or knowledge that is of interest to a community of users.

Specification refers to the language and vocabulary terms that are used to specify

the conceptualization. The ontology includes both specification and

conceptualization. For example, the same conceptualization may be specified in two

different languages, giving two separate ontologies. Based on this quite general def-

inition, there is no consensus on what an ontology is exactly. Some possible ways to

describe ontologies are as follows:

■ A thesaurus (or even a dictionary or a glossary of terms) describes the rela-

tionships between words (vocabulary) that represent various concepts.

■ A taxonomy describes how concepts of a particular area of knowledge are

related using structures similar to those used in a specialization or general-

ization.

■ A detailed database schema is considered by some to be an ontology that

describes the concepts (entities and attributes) and relationships of a mini-

world from reality.

■ A logical theory uses concepts from mathematical logic to try to define con-

cepts and their interrelationships.

Usually the concepts used to describe ontologies are quite similar to the concepts we

discussed in conceptual modeling, such as entities, attributes, relationships, special-

izations, and so on. The main difference between an ontology and, say, a database

schema, is that the schema is usually limited to describing a small subset of a mini-

¹³This definition is given in Gruber (1995).

Review Questions

273

world from reality in order to store and manage data. An ontology is usually consid-

ered to be more general in that it attempts to describe a part of reality or a domain

of interest (for example, medical terms, electronic-commerce applications, sports,

and so on) as completely as possible.

8.8 Summary

In this chapter we discussed extensions to the ER model that improve its representa-

tional capabilities. We called the resulting model the enhanced ER or EER model.

We presented the concept of a subclass and its superclass and the related mechanism

of attribute/relationship inheritance. We saw how it is sometimes necessary to create

additional classes of entities, either because of additional specific attributes or

because of specific relationship types. We discussed two main processes for defining

superclass/subclass hierarchies and lattices: specialization and generalization.

Next, we showed how to display these new constructs in an EER diagram. We also

discussed the various types of constraints that may apply to specialization or gener-

alization. The two main constraints are total/partial and disjoint/overlapping. In

addition, a defining predicate for a subclass or a defining attribute for a specializa-

tion may be specified. We discussed the differences between user-defined and

predicate-defined subclasses and between user-defined and attribute-defined spe-

cializations. Finally, we discussed the concept of a category or union type, which is a

subset of the union of two or more classes, and we gave formal definitions of all the

concepts presented.

We introduced some of the notation and terminology of UML for representing spe-

cialization and generalization. In Section 8.7 we briefly discussed the discipline of

knowledge representation and how it is related to semantic data modeling. We also

gave an overview and summary of the types of abstract data representation con-

cepts: classification and instantiation, identification, specialization and generaliza-

tion, and aggregation and association. We saw how EER and UML concepts are

related to each of these.

Review Questions

8.1. What is a subclass? When is a subclass needed in data modeling?

8.2. Define the following terms: superclass of a subclass, superclass/subclass rela-

tionship, IS-A relationship, specialization, generalization, category, specific

(local) attributes, and specific relationships.

8.3. Discuss the mechanism of attribute/relationship inheritance. Why is it use-

ful?

8.4. Discuss user-defined and predicate-defined subclasses, and identify the dif-

ferences between the two.

8.5. Discuss user-defined and attribute-defined specializations, and identify the

differences between the two.

274

Chapter 8 The Enhanced Entity-Relationship (EER) Model

8.6.

Discuss the two main types of constraints on specializations and generaliza-

tions.

8.7.

What is the difference between a specialization hierarchy and a specialization

lattice?

8.8.

What is the difference between specialization and generalization? Why do we

not display this difference in schema diagrams?

8.9.

How does a category differ from a regular shared subclass? What is a cate-

gory used for? Illustrate your answer with examples.

8.10.

For each of the following UML terms (see Sections 7.8 and 8.6), discuss the

corresponding term in the EER model, if any: object, class, association,

aggregation, generalization, multiplicity, attributes, discriminator, link, link

attribute, reflexive association, and qualified association.

8.11.

Discuss the main differences between the notation for EER schema diagrams

and UML class diagrams by comparing how common concepts are repre-

sented in each.

8.12.

List the various data abstraction concepts and the corresponding modeling

concepts in the EER model.

8.13.

What aggregation feature is missing from the EER model? How can the EER

model be further enhanced to support it?

8.14.

What are the main similarities and differences between conceptual database

modeling techniques and knowledge representation techniques?

8.15.

Discuss the similarities and differences between an ontology and a database

schema.

Exercises

8.16. Design an EER schema for a database application that you are interested in.

Specify all constraints that should hold on the database. Make sure that the

schema has at least five entity types, four relationship types, a weak entity

type, a superclass/subclass relationship, a category, and an n-ary (n > 2) rela-

tionship type.

8.17. Consider the BANK ER schema in Figure 7.21, and suppose that it is neces-

sary to keep track of different types of ACCOUNTS (SAVINGS_ACCTS,

CHECKING_ACCTS, …) and LOANS (CAR_LOANS, HOME_LOANS, …).

Suppose that it is also desirable to keep track of each ACCOUNT’s

TRANSACTIONS (deposits, withdrawals, checks, …) and each LOAN’s

PAYMENTS; both of these include the amount, date, and time. Modify the

BANK schema, using ER and EER concepts of specialization and generaliza-

tion. State any assumptions you make about the additional requirements.

Exercises

275

8.18.

The following narrative describes a simplified version of the organization of

Olympic facilities planned for the summer Olympics. Draw an EER diagram

that shows the entity types, attributes, relationships, and specializations for

this application. State any assumptions you make. The Olympic facilities are

divided into sports complexes. Sports complexes are divided into one-sport

and multisport types. Multisport complexes have areas of the complex desig-

nated for each sport with a location indicator (e.g., center, NE corner, and so

on). A complex has a location, chief organizing individual, total occupied

area, and so on. Each complex holds a series of events (e.g., the track stadium

may hold many different races). For each event there is a planned date, dura-

tion, number of participants, number of officials, and so on. A roster of all

officials will be maintained together with the list of events each official will

be involved in. Different equipment is needed for the events (e.g., goal posts,

poles, parallel bars) as well as for maintenance. The two types of facilities

(one-sport and multisport) will have different types of information. For

each type, the number of facilities needed is kept, together with an approxi-

mate budget.

8.19.

Identify all the important concepts represented in the library database case

study described below. In particular, identify the abstractions of classifica-

tion (entity types and relationship types), aggregation, identification, and

specialization/generalization. Specify (min, max) cardinality constraints

whenever possible. List details that will affect the eventual design but that

have no bearing on the conceptual design. List the semantic constraints sep-

arately. Draw an EER diagram of the library database.

Case Study: The Georgia Tech Library (GTL) has approximately 16,000

members, 100,000 titles, and 250,000 volumes (an average of 2.5 copies per

book). About 10 percent of the volumes are out on loan at any one time. The

librarians ensure that the books that members want to borrow are available

when the members want to borrow them. Also, the librarians must know how

many copies of each book are in the library or out on loan at any given time.

A catalog of books is available online that lists books by author, title, and sub-

ject area. For each title in the library, a book description is kept in the catalog

that ranges from one sentence to several pages. The reference librarians want

to be able to access this description when members request information

about a book. Library staff includes chief librarian, departmental associate

librarians, reference librarians, check-out staff, and library assistants.

Books can be checked out for 21 days. Members are allowed to have only five

books out at a time. Members usually return books within three to four

weeks. Most members know that they have one week of grace before a notice

is sent to them, so they try to return books before the grace period ends.

About 5 percent of the members have to be sent reminders to return books.

Most overdue books are returned within a month of the due date.

Approximately 5 percent of the overdue books are either kept or never

returned. The most active members of the library are defined as those who

276

Chapter 8 The Enhanced Entity-Relationship (EER) Model

borrow books at least ten times during the year. The top 1 percent of mem-

bership does 15 percent of the borrowing, and the top 10 percent of the

membership does 40 percent of the borrowing. About 20 percent of the

members are totally inactive in that they are members who never borrow.

To become a member of the library, applicants fill out a form including their

SSN, campus and home mailing addresses, and phone numbers. The librari-

ans issue a numbered, machine-readable card with the member’s photo on

it. This card is good for four years. A month before a card expires, a notice is

sent to a member for renewal. Professors at the institute are considered auto-

matic members. When a new faculty member joins the institute, his or her

information is pulled from the employee records and a library card is mailed

to his or her campus address. Professors are allowed to check out books for

three-month intervals and have a two-week grace period. Renewal notices to

professors are sent to their campus address.

The library does not lend some books, such as reference books, rare books,

and maps. The librarians must differentiate between books that can be lent

and those that cannot be lent. In addition, the librarians have a list of some

books they are interested in acquiring but cannot obtain, such as rare or out-

of-print books and books that were lost or destroyed but have not been

replaced. The librarians must have a system that keeps track of books that

cannot be lent as well as books that they are interested in acquiring. Some

books may have the same title; therefore, the title cannot be used as a means

of identification. Every book is identified by its International Standard Book

Number (ISBN), a unique international code assigned to all books. Two

books with the same title can have different ISBNs if they are in different lan-

guages or have different bindings (hardcover or softcover). Editions of the

same book have different ISBNs.

The proposed database system must be designed to keep track of the mem-

bers, the books, the catalog, and the borrowing activity.

8.20.

Design a database to keep track of information for an art museum. Assume

that the following requirements were collected:

■ The museum has a collection of ART_OBJECTS. Each ART_OBJECT has a

unique Id_no, an Artist (if known), a Year (when it was created, if known),

a Title, and a Description. The art objects are categorized in several ways, as

discussed below.

■ ART_OBJECTS are categorized based on their type. There are three main

types: PAINTING, SCULPTURE, and STATUE, plus another type called

OTHER to accommodate objects that do not fall into one of the three

main types.

■ A PAINTING has a Paint_type (oil, watercolor, etc.), material on which it is

Drawn_on (paper, canvas, wood, etc.), and Style (modern, abstract, etc.).

■ A SCULPTURE or a statue has a Material from which it was created (wood,

stone, etc.), Height, Weight, and Style.

Exercises

277

■ An art object in the OTHER category has a Type (print, photo, etc.) and

Style.

■ ART_OBJECTs are categorized as either PERMANENT_COLLECTION

(objects that are owned by the museum) and BORROWED. Information

captured about objects in the PERMANENT_COLLECTION includes

Date_acquired, Status

(on display, on loan, or stored), and Cost.

Information captured about BORROWED objects includes the Collection

from which it was borrowed, Date_borrowed, and Date_returned.

■ Information describing the country or culture of Origin (Italian, Egyptian,

American, Indian, and so forth) and Epoch (Renaissance, Modern,

Ancient, and so forth) is captured for each ART_OBJECT.

■ The museum keeps track of ARTIST information, if known: Name,

DateBorn (if known), Date_died (if not living), Country_of_origin, Epoch,

Main_style, and Description. The Name is assumed to be unique.

■ Different EXHIBITIONS occur, each having a Name, Start_date, and

End_date. EXHIBITIONS are related to all the art objects that were on dis-

play during the exhibition.

■ Information is kept on other COLLECTIONS with which the museum

interacts, including Name (unique), Type (museum, personal, etc.),

Description, Address, Phone, and current Contact_person.

Draw an EER schema diagram for this application. Discuss any assumptions

you make, and that justify your EER design choices.

8.21.

Figure 8.12 shows an example of an EER diagram for a small private airport

database that is used to keep track of airplanes, their owners, airport

employees, and pilots. From the requirements for this database, the follow-

ing information was collected: Each AIRPLANE has a registratioumber

[Reg#], is of a particular plane type [OF_TYPE], and is stored in a particular

hangar [STORED_IN]. Each PLANE_TYPE has a model number [Model], a

capacity [Capacity], and a weight [Weight]. Each HANGAR has a number

[Number], a capacity [Capacity], and a location [Location]. The database also

keeps track of the OWNERs of each plane [OWNS] and the EMPLOYEEs who

have maintained the plane [MAINTAIN]. Each relationship instance in OWNS

relates an AIRPLANE to an OWNER and includes the purchase date [Pdate].

Each relationship instance in MAINTAIN relates an EMPLOYEE to a service

record [SERVICE]. Each plane undergoes service many times; hence, it is

related by [PLANE_SERVICE] to a number of SERVICE records. A SERVICE

record includes as attributes the date of maintenance [Date], the number of

hours spent on the work [Hours], and the type of work done [Work_code].

We use a weak entity type [SERVICE] to represent airplane service, because

the airplane registratioumber is used to identify a service record. An

OWNER is either a person or a corporation. Hence, we use a union type (cat-

egory)

[OWNER] that is a subset of the union of corporation

[CORPORATION] and person [PERSON] entity types. Both pilots [PILOT]

and employees [EMPLOYEE] are subclasses of PERSON. Each PILOT has

278

Chapter 8 The Enhanced Entity-Relationship (EER) Model

Sala ry

Shift

Model Capacity

Weight

WORKS_ON

EMPLOYEE

MAINTAIN

PLANE_TYPE

Restr

Lic_num

FLIES

PILOT

OF_TYPE

Date

Workcode

Date/workcode

SERVICE

Hours

Reg#

AIRPLANE

PLANE_SERVICE

Pdate

STORED_IN

OWNS

OWNER

HANGAR

CORPORATION

Ssn

PERSON

Number

Location

Name

Phone

Name

Phone

Capacity

Add ress

Figure 8.12

EER schema for a SMALL_AIRPORT database.

specific attributes license number [Lic_num] and restrictions [Restr]; each

EMPLOYEE has specific attributes salary [Salary] and shift worked [Shift]. All

PERSON entities in the database have data kept on their Social Security

number [Ssn], name [Name], address [Address], and telephone number

[Phone]. For CORPORATION entities, the data kept includes name [Name],

address [Address], and telephone number [Phone]. The database also keeps

track of the types of planes each pilot is authorized to fly [FLIES] and the

types of planes each employee can do maintenance work on [WORKS_ON].

Exercises

279

Show how the SMALL_AIRPORT EER schema in Figure 8.12 may be repre-

sented in UML notation. (Note: We have not discussed how to represent cat-

egories (union types) in UML, so you do not have to map the categories in

this and the following question.)

8.22. Show how the UNIVERSITY EER schema in Figure 8.9 may be represented in

UML notation.

8.23. Consider the entity sets and attributes shown in the table below. Place a

checkmark in one column in each row to indicate the relationship between

the far left and right columns.

a. The left side has a relationship with the right side.

b. The right side is an attribute of the left side.

c. The left side is a specialization of the right side.

d. The left side is a generalization of the right side.

(a) Has a

(b) Has an

(d) Is a

Relationship

Attribute

Specialization

Generalization

Entity Set

with

that is

or Attribute

MOTHER

PERSON

DAUGHTER

MOTHER

STUDENT

PERSON

STUDENT

Student_id

SCHOOL

STUDENT

SCHOOL

CLASS_ROOM

ANIMAL

HORSE

Breed

HORSE

Age

10.

EMPLOYEE

SSN

11.

FURNITURE

CHAIR

12.

CHAIR

Weight

13.

HUMAN

WOMAN

14.

SOLDIER

PERSON

15.

ENEMY_COMBATANT

PERSON

8.24. Draw a UML diagram for storing a played game of chess in a database. You

may look at http://www.chessgames.com for an application similar to what

you are designing. State clearly any assumptions you make in your UML dia-

gram. A sample of assumptions you can make about the scope is as follows:

1. The game of chess is played between two players.

2. The game is played on an 8 × 8 board like the one shown below:

280

Chapter 8 The Enhanced Entity-Relationship (EER) Model

3. The players are assigned a color of black or white at the start of the game.

4. Each player starts with the following pieces (traditionally called chess-

men):

a. king

2 bishops

b. queen

2 knights

2 rooks

8 pawns

5. Every piece has its own initial position.

6. Every piece has its own set of legal moves based on the state of the game.

You do not need to worry about which moves are or are not legal except

for the following issues:

a. A piece may move to an empty square or capture an opposing piece.

b. If a piece is captured, it is removed from the board.

c. If a pawn moves to the last row, it is “promoted” by converting it to

another piece (queen, rook, bishop, or knight).

Note: Some of these functions may be spread over multiple classes.

8.25.

Draw an EER diagram for a game of chess as described in Exercise 8.24.

Focus on persistent storage aspects of the system. For example, the system

would need to retrieve all the moves of every game played in sequential

order.

8.26.

Which of the following EER diagrams is/are incorrect and why? State clearly

any assumptions you make.

Laboratory Exercises

281

8.27.

Consider the following EER diagram that describes the computer systems at

a company. Provide your own attributes and key for each entity type. Supply

max cardinality constraints justifying your choice. Write a complete narra-

tive description of what this EER diagram represents.

INSTALLED

SOLD_WITH

SOFTWARE

COMPUTER

INSTALLED_OS

OPERATING_

SYSTEM

LAPTOP

DESKTOP

OPTIONS

COMPONENT

MEM_OPTIONS

SUPPORTS

ACCESSORY

MEMORY

VIDEO_CARD

SOUND_CARD

KEYBOARD

MOUSE

MONITOR

Laboratory Exercises

8.28. Consider a GRADE_BOOK database in which instructors within an academic department record

points earned by individual students in their classes. The data requirements are summarized as

follows:

■ Each student is identified by a unique identifier, first and last name, and an e-mail address.

■ Each instructor teaches certain courses each term. Each course is identified by a course num-

ber, a sectioumber, and the term in which it is taught. For each course he or she teaches, the

282

Chapter 8 The Enhanced Entity-Relationship (EER) Model

instructor specifies the minimum number of points required in order to

earn letter grades A, B, C, D, and F. For example, 90 points for an A, 80

points for a B, 70 points for a C, and so forth.

■ Students are enrolled in each course taught by the instructor.

■ Each course has a number of grading components (such as midterm

exam, final exam, project, and so forth). Each grading component has a

maximum number of points (such as 100 or 50) and a weight (such as

20% or 10%). The weights of all the grading components of a course usu-

ally total 100.

■ Finally, the instructor records the points earned by each student in each of

the grading components in each of the courses. For example, student

1234 earns 84 points for the midterm exam grading component of the

section 2 course CSc2310 in the fall term of 2009. The midterm exam

grading component may have been defined to have a maximum of 100

points and a weight of 20% of the course grade.

Design an Enhanced Entity-Relationship diagram for the grade book data-

base and build the design using a data modeling tool such as ERwin or

Rational Rose.

8.29.

Consider an ONLINE_AUCTION database system in which members (buyers

and sellers) participate in the sale of items. The data requirements for this

system are summarized as follows:

■ The online site has members, each of whom is identified by a unique

member number and is described by an e-mail address, name, password,

home address, and phone number.

■ A member may be a buyer or a seller. A buyer has a shipping address

recorded in the database. A seller has a bank account number and routing

number recorded in the database.

■ Items are placed by a seller for sale and are identified by a unique item

number assigned by the system. Items are also described by an item title, a

description, starting bid price, bidding increment, the start date of the

auction, and the end date of the auction.

■ Items are also categorized based on a fixed classification hierarchy (for

example, a modem may be classified as COMPUTER→HARDWARE

→MODEM).

■ Buyers make bids for items they are interested in. Bid price and time of

bid is recorded. The bidder at the end of the auction with the highest bid

price is declared the winner and a transaction between buyer and seller

may then proceed.

■ The buyer and seller may record feedback regarding their completed

transactions. Feedback contains a rating of the other party participating

in the transaction (1-10) and a comment.

Laboratory Exercises

283

Design an Enhanced Entity-Relationship diagram for the ONLINE_AUCTION

database and build the design using a data modeling tool such as ERwin or

Rational Rose.

8.30.

Consider a database system for a baseball organization such as the major

leagues. The data requirements are summarized as follows:

■

The personnel involved in the league include players, coaches, managers,

and umpires. Each is identified by a unique personnel id. They are also

described by their first and last names along with the date and place of

birth.

■

Players are further described by other attributes such as their batting ori-

entation (left, right, or switch) and have a lifetime batting average (BA).

■

Within the players group is a subset of players called pitchers. Pitchers

have a lifetime ERA (earned run average) associated with them.

■

Teams are uniquely identified by their names. Teams are also described by

the city in which they are located and the division and league in which

they play (such as Central division of the American League).

■

Teams have one manager, a number of coaches, and a number of players.

■

Games are played between two teams with one designated as the home

team and the other the visiting team on a particular date. The score (runs,

hits, and errors) are recorded for each team. The team with the most runs

is declared the winner of the game.

■

With each finished game, a winning pitcher and a losing pitcher are

recorded. In case there is a save awarded, the save pitcher is also recorded.

■

With each finished game, the number of hits (singles, doubles, triples, and

home runs) obtained by each player is also recorded.

Design an Enhanced Entity-Relationship diagram for the BASEBALL data-

base and enter the design using a data modeling tool such as ERwin or

Rational Rose.

8.31.

Consider the EER diagram for the UNIVERSITY database shown in Figure

8.9. Enter this design using a data modeling tool such as ERwin or Rational

Rose. Make a list of the differences iotation between the diagram in the

text and the corresponding equivalent diagrammatic notation you end up

using with the tool.

8.32.

Consider the EER diagram for the small AIRPORT database shown in Figure

8.12. Build this design using a data modeling tool such as ERwin or Rational

Rose. Be careful as to how you model the category OWNER in this diagram.

(Hint: Consider using CORPORATION_IS_OWNER and PERSON_IS_

OWNER as two distinct relationship types.)

8.33.

Consider the UNIVERSITY database described in Exercise 7.16. You already

developed an ER schema for this database using a data modeling tool such as

284

Chapter 8 The Enhanced Entity-Relationship (EER) Model

ERwin or Rational Rose in Lab Exercise 7.31. Modify this diagram by classi-

fying COURSES as either UNDERGRAD_COURSES or GRAD_COURSES

and INSTRUCTORS as either JUNIOR_PROFESSORS or

SENIOR_PROFESSORS. Include appropriate attributes for these new entity

types. Then establish relationships indicating that junior instructors teach

undergraduate courses while senior instructors teach graduate courses.

Selected Bibliography

Many papers have proposed conceptual or semantic data models. We give a repre-

sentative list here. One group of papers, including Abrial (1974), Senko’s DIAM

model (1975), the NIAM method (Verheijen and VanBekkum 1982), and Bracchi et

al. (1976), presents semantic models that are based on the concept of binary rela-

tionships. Another group of early papers discusses methods for extending the rela-

tional model to enhance its modeling capabilities. This includes the papers by

Schmid and Swenson (1975), Navathe and Schkolnick (1978), Codd’s RM/T model

(1979), Furtado (1978), and the structural model of Wiederhold and Elmasri

(1979).

The ER model was proposed originally by Chen (1976) and is formalized in Ng

(1981). Since then, numerous extensions of its modeling capabilities have been pro-

posed, as in Scheuermann et al. (1979), Dos Santos et al. (1979), Teorey et al. (1986),

Gogolla and Hohenstein (1991), and the Entity-Category-Relationship (ECR)

model of Elmasri et al. (1985). Smith and Smith (1977) present the concepts of gen-

eralization and aggregation. The semantic data model of Hammer and McLeod

(1981) introduced the concepts of class/subclass lattices, as well as other advanced

modeling concepts.

A survey of semantic data modeling appears in Hull and King (1987). Eick (1991)

discusses design and transformations of conceptual schemas. Analysis of con-

straints for n-ary relationships is given in Soutou (1998). UML is described in detail

in Booch, Rumbaugh, and Jacobson (1999). Fowler and Scott (2000) and Stevens

and Pooley (2000) give concise introductions to UML concepts.

Fensel (2000, 2003) discuss the Semantic Web and application of ontologies.

Uschold and Gruninger (1996) and Gruber (1995) discuss ontologies. The June

2002 issue of Communications of the ACM is devoted to ontology concepts and

applications. Fensel (2003) is a book that discusses ontologies and e-commerce.

chapter

Relational Database

Design by ER- and

EER-to-Relational Mapping

his chapter discusses how to design a relational

database schema based on a conceptual schema

design. Figure 7.1 presented a high-level view of the database design process, and in

this chapter we focus on the logical database design or data model mapping step of

database design. We present the procedures to create a relational schema from an

Entity-Relationship (ER) or an Enhanced ER (EER) schema. Our discussion relates

the constructs of the ER and EER models, presented in Chapters 7 and 8, to the con-

structs of the relational model, presented in Chapters 3 through 6. Many computer-

aided software engineering (CASE) tools are based on the ER or EER models, or

other similar models, as we have discussed in Chapters 7 and 8. Many tools use ER

or EER diagrams or variations to develop the schema graphically, and then convert

it automatically into a relational database schema in the DDL of a specific relational

DBMS by employing algorithms similar to the ones presented in this chapter.

We outline a seven-step algorithm in Section 9.1 to convert the basic ER model con-

structs—entity types (strong and weak), binary relationships (with various struc-

tural constraints), n-ary relationships, and attributes (simple, composite, and

multivalued)—into relations. Then, in Section 9.2, we continue the mapping algo-

rithm by describing how to map EER model constructs—specialization/generaliza-

tion and union types (categories)—into relations. Section 9.3 summarizes the

chapter.

285

Functional gastrointestinal disturbances in the early age children

Diseases of the lips and tongue in children

Diaphragmatic hernia. Disease of mediastinum

IMMUNE PREPARATIONS

Lecture 1

Articulation and occlusion. Biomechanics of movement of the mandible (Vertical, sagittal, transversal movements of the mandible).

Приєднуйся до нас!

Підписатись на новини:

Наші соц мережі

Leave a Reply Cancel reply

Functional gastrointestinal disturbances in the early age children

Diseases of the lips and tongue in children

Diaphragmatic hernia. Disease of mediastinum

IMMUNE PREPARATIONS

Lecture 1

Articulation and occlusion. Biomechanics of movement of the mandible (Vertical, sagittal, transversal movements of the mandible).