Comp 353/453: Database Programming, Corboy
523, 7:00 Thursdays
Week 7, Mar 1
Read in Elmasri & Navathe (EN)
- Chapter 3, The Relational Data Model ...., section 3: Update
Operations
- Chapter 4: Basic SQL
- Chapter 5:
Midterm: March 15
ER diagram for the STUDENT database
Entities:
- student (name, student_number, class, major)
- course (course_name, course_number, credit_hours, department)
- section (section_identifier, semester, year, instructor) // why not course_number?
Relationships:
course----< PREREQUISITE >---- course
section----< IS_OFFERING_OF >---- course
student ----< REGISTERS_FOR >---- section
(min,max) annotation
Instead of labeling lines connecting a relationship to an entity with
1, M, or N, we can also use a (min,max) notation, meaning that each
entity e in the entity set E must participate in at least min entries
of the relationship, and at most max. If min>0, the participation is
total; min=0 means partial participation. The max is denoted N when we
mean it is allowed to be >1.
Note that a 1-N relationship would have the values reversed using the
(min,max) notation:
DEPARTMENT ===1=== employs ===N=== EMPLOYEE
// 1 dept = N employees
DEPARTMENT ---(1,N)--- employs --- (1,1)-----
EMPLOYEE // dept can have 1..N employees
The second line, above, means that a given department can appear
multiple times in the EMPLOYS relationship; ie a department can have
multiple employees. An employee can appear only once; that is, can work
for only a single department. Every employee must appear at least once,
and every department.
Example: Fig 7.15
Why do they say
department ----(4,N)---employs ----(1,1)----employee
?
Manages relationship: put into entity on (1,1) side rather than entity on (0,1) side
(0,1) doesn't say anything about how often the participation can be 0. Consider
MANAGES------- (0,1)-----manager (employee)
SUPERVISION --- (0,1)--- supervisee (employee)
Most employees are not managers. Almost all employees are supervised.
UML diagrams
See Figure 7.16. UML diagrams have space for operations,which
in the world of databases we're not much concerned about. The big boxes
are for entities; relationships have been reduced to boxes that
annotate links. A (min,max) notation is used, but the label goes on the
opposite entity.
UML relationships (actually, ER relationships as well) may either be of
association or of aggregation. The latter implies a
collection, eg of employees into one department.
More on Foreign Keys
Here's the seven-step ER-to-relation algorithm again, slightly simplified:
- create a table for each regular entity
- create a table for each weak entity, adding the key field from the owner entity as a foreign key for the new entity.
- for binary relationships between entities E1 and E2, pick one of
them (eg E1) and add to it a field conntaining the key to E2. Make this
a foreign key in E1.
- for binary 1:N relationships between E1 and E2,
E1---1---R---N---E2, add a column to E2 containing the key of E1. Make
this a foreign key in E2.
- For binary N:M relationships between E1 and E2, create a new
table R consisting of ⟨E1.key, E2.key, R.attributes⟩. Make E1.key and
E2.key foreign keys in R.
- For multivalued attributes of entity E, create a new relation R.
One column of R will be E.key; this should be a foreign key in R.
- ternary and higher-degree relationships: like step 5.
Joins arise in steps 2, 3, 4, 5, 6, and 7, for recovering the original relationships (or attribute sets for 6, or entities for 2). In every case, the join field is a key of one relation and a foreign key in the other.
Also, I said earlier that entity T should not have an attribute that
was another entity of type S; instead, we should create a relationship
R between T and S. If S was at all a candidate for an attribute,
each T would be related to at most one S and so this would have
cardinality constraint T---N---R---1---S. Then, when we did the above
conversion, in step four we would add S's key to T with a foreign key
constraint referring to S.
But suppose we did add S as an entity attribute to T. Then we would end up with the same situation: we would use the key of S as an attribute of T, and create the same foreign-key constraint. So in the end we get the same thing.
Invoice
(I redid this after class to better clarify the core problems)
How shall we model invoices? An invoice is a collection of parts
ordered, each with a quantity. One way is to try to model it as a
binary relationship between CUSTOMER and PART, with attributes date and
quantity. An invoice is thus all the items to the same customer with
the same date.
CUSTOMER---<INVOICE>---PART
date
quantity
An invoice would be uniquely determined by the date, so if Customer c
ordered part P on date D with quantity q we would have ⟨c,p,d,q⟩ ∈
Invoice. Given ⟨c,d⟩ we can look up all the parts p and, for each part,
the quantity.
For a given c and d there might be multiple parts p that were part of
the invoice. We can search the Invoice table for those ⟨c,d⟩, and find
the balance of each record.
Problem: INVOICE is not
actually a "relationship set" for entities Customer and Part, as
defined in E&N §7.4.1; a relationship would have to be a subset of
the cross product Customer × Part; we can add attributes, but the ⟨c,p⟩
part is supposed to determine the record. However, the values of c and p do not determine an INVOICE record. The key for INVOICE is the triple ⟨c,p,d⟩; a customer c can order 100 units of d on 2005-12-01 and then 200 more units on 2006-01-27.
What we really have here is that INVOICE is a ternary
relationship between Customer, Part, and a single-attribute entity
Order_Date. Ternary relationships tend to be inefficient. None of the
relationships in the COMPANY database were ternary; in WORKS_ON, a
record was uniquely determined by the essn and the project_num; in
WORKS_FOR, by the ssn and the dept_no.
A much more common approach (which also allows multiple invoices on a single day) is to use a synthetic key invoice_num to create an entity Invoice. This is an instance of a rather general strategy that might be called the synthetic-key trick:
convert a relationship to an entity by assigning a "serial number" to
each tuple in the relationship. In this case the synthetic key has a
natural interpretation: we number each order as it is placed. For the
works_on relationship of the COMPANY database we might use a synthetic
key called Job_Assignment_Num; for the Works_For relationship between
Employees and Departments we might use Job_Association_Num.
Once we make Invoice an entity, we identify its attributes Cust_id and
Order_date. We also have a new relationship Invoice_Item (it could also
be called Part_of_Order), as follows:
Invoice --------- Invoice_Item ---------- Part
quantity
(Actually, Invoice also has a relationship Ordered_By to Customer;
that's N:1 so I've immediately implemented it by adding a Cust_id
attribute to Invoice. We replaced one sort-of-binary relationship
Invoice between Customer and Part with a new entity
Invoice with binary relationships to each of Customer and Part. But
only the relationship with Part is M:N and so needs it own table.)
We implement Invoice_Item as its own table listing invoice numbers,
part numbers and quantities. The primary key is the pair ⟨invoice_num,
part_num⟩; the table also has an attribute for quantity (and perhaps
also for current_price, or for discount). The INVOICE table might look
like this:
Table Invoice
Invoice_num
|
Cust_id
|
Order_date
|
10001
|
201
|
2011-11-17
|
10002
|
251
|
2011-11-17
|
10003
|
201
|
2011-11-25
|
10004
|
287
|
2011-11-25
|
and the table Invoice_Items might look like this:
Table Invoice_Items
Invoice_num
|
Part_num
|
Quantity
|
10001
|
37
|
50
|
10001
|
41
|
100
|
10002
|
83
|
4
|
10003
|
37
|
200
|
10003
|
59
|
100
|
10004
|
37
|
100
|
Bottom line:
- The original INVOICE relationship turned out to be ternary rather than binary
- When we made INVOICE an entity, by using the synthetic-key trick, we just had one strictly-binary table to implement
ER-to-relation mapping of ternary and other higher-degree relationships
Consider the
SUPPLY relationship on a supplier s, project j, and part p. The tuple
⟨s,j,p⟩ is included if supplier s supplies part p for project j.
We might try to model this with three binary relationships,
SUPPLIES(s,j), CAN_SUPPLY(s,p), and USES(j,p). It is true that if
⟨s,j,p⟩ is in SUPPLY, then ⟨s,j⟩ is in SUPPLIES, ⟨s,p⟩ is in
CAN_SUPPLY, and ⟨j,p⟩ is in USES. But the converse is not true
(example). If we build the three binary tables, we cannot reconstruct
the ternary table.
See Fig 7.17.
As for binary relationships, a ternary relationship key is a triple of
keys from each participating entity.
Ternary relationships can be problematic, and so we often include
corresponding binary relationships, sometimes even if they are reundant.
One approach is to model a ternary relationship as a weak entity, with
three identifying relationships (Fig 7.17(c)).
This is usually done only when the underlying ER-modeling tools do not
support ternary relationships. The resultant entity has the requisite
three-attribute key to describe the ternary relationship accurately.
Alternatively, we can give SUPPLY a
synthetic ("surrogate") key, supply_id, and then relate it to SUPPLIER, PROJECT, AND
PART by binary relationships. The synthetic key would uniquely
determine a ⟨s,j,p⟩ triple; we can say this in SQL by saying that
⟨s,j,p⟩ is a secondary key. With a synthetic key we now have an entity
SUPPLY, with key supply_id si, and with three relationships
SUPPLIES3(si, s, j), CAN_SUPPLY3(si,s,p) and USES3(si,j,p). We may still
need a ternary relationship explaining the relationship of all three, but from the entity SUPPLY(supply_id, supplier, project, part) we can now reconstruct the original ternary table.
The next example is OFFERS, for a school database; see Fig 7.18;
⟨i,s,c⟩ belongs to OFFERS if INSTRUCTOR i teaches COURSE c during
SEMESTER s. Again, the binary projections fail to adequately model the
ternary relationship. E&N suggest that if one of the binary
relationships is 1:1 (eg if the CAN_TEACH(i,c) relationship is 1:1),
then this does work, but that is seldom if ever the case.
Actually, in a real school database one would not use OFFERS, one
would
use SECTION. The latter would likely be an entity, complete with
synthetic key section_id. However, SECTIONs are determined by
instructor i, course c, semester s, and timeslot ts, or, alternatively,
by semester and course and section_number or semester and class number (Loyola's mechanism).
Ternary relationships can have cardinality tags, like binary
relationships, but they are not as straightforward. For the SUPPLY
relationship, suppose each project,part pair (j,p) can have only one
supplier. Then we might put a 1 on the SUPPLIER link. But it might also
be the case that each project j uses a unique supplier,part pair (s,p)
(that is, each supplier can supply only one part to each project). We
now have an unrelated 1 on
the PROJECT link.
The (min,max) relationship is more straightforward: E------(m,M)-----R
means that for each e in E, there are at least m tuples involving e in
R, and at most M. But this method cannot describe the case where each project/part combination can have only one supplier.
Translation from higher-degree relationships to SQL table definitions
is done the same way as for binary M:N relationships: we create a table
consisting of columns for keys for each of the participating entities,
and any relationship attributes. The entity keys form the primary key
for the new table, and each entity key has a foreign key constraint
referring back to its defining entity table.
The Enhanced ER (EER) model
In this model, we allow for some forms of inheritance.
Figure 8.1 is a starting point. Note that there are several kinds
of employee; some kinds participate in relationships and some do not.
Note the symbol denoting inheritance; arrows from the parent class to
the child class are more common in OOP design.
Ultimately, we may implement the diagram of Fig 8.1 with an EMPLOYEE
table, and also tables for SECRETARY, TECHNICIAN, ENGINEER, MANAGER,
and HOURLY_EMPLOYEE, each indexed by the ssn and having additional
columns for the subclass-specific attributes.
To be in a subclass, you must also be in the superclass.
Note that some subclasses have subclass-specific attributes, and other
subclasses have subclass-specific participation in relationships (eg Manager and Hourly_Employee).
The circled (d) in Fig 8.1 stands for "disjoint"; one cannot be a SECRETARY and a TECHNICIAN. However, one cansimultaneously
be a SECRETARY, a MANAGER, and an HOURLY_EMPLOYEE (at least as the
relationship is drawn). In practice, it is likely that each of
Secretary, Technician, Manager and Engineer would also belong either to
Hourly_Employee or Salaried_Employee. In
general, membership in multiple subclasses is to be allowed unless
explicitly forbidden with the (d) notation.
The alternative to (d) is
(o), for overlapping. EN's example for overlapping subclasses is in Fig 8.5: the parent class is PART and the subclasses are MANUFACTURED_PART and PURCHASED_PART. Some parts can be both here.
Fig 8.2 example (showing
disjointness). Secretaries, Engineers and Technicians are all
Employees, but everyone belongs to at most one category.
Generalization is the process of realizing that two existing entities,
CAR and TRUCK, are really both instances of VEHICLE. See fig 8.3.
Sometimes subclass membership is determined by a field value or boolean expression
involving the parent class (eg jobtype = engineer). Note that such
"tag" fields are frowned upon in classic OOP in, say, Java. This
arrangement is also called attribute-defined
subclassing (or specialization). If the value of a single attribute
determines the subclass, this necessarily leads to disjoint subclasses.
This is illustrated in Fig 8.4. In other examples,
subclass membership represents a form of new data; these are user-defined subclasses. As new subclass records are inserted into the database, the appropriate subclass must also be indicated.
Besides disjoint/overlapping, subclasses may be described as total or
partial. Total means that every member of the base class must be in
some subclass (ie that the base class is abstract
in java notation). Partial means that base-class-only objects may
exist. In the Fig 8.1 example, every employee is either salaried or
hourly, so the right-hand subclass is total. The double line is used to
denote this. Note that this has nothing to do with any of the other subclass reationships.
Multiple inheritance means that we may end up with a lattice of relationships: see Fig 8.6 and Fig 8.7.
If multiple inheritance is involved, the classes will not be disjoint.
A common OOP issue with multiple inheritance -- resolving method or
attribute names when the same name is used in more than one parent
class -- is usually handled by requiring attribute names to be unique.
Union types
Sometimes the best way to model a Vehicle type is simply as a union of existing types Car and Truck. See Fig 8.8 for two examples.
Union types generally mean that the designer has not taken advantage of
any common attributes. It is particularly helpful to identify a primary
key that can be moved to the base class.
EER-to-Relations mapping
Given a parent class C with subclasses S1, S2, ..., Sm, here are some options for defining relations:
A. Create a table for representing C, and separate tables for each Si.
Each Si will include a column representing the corresponding C data.
For example, C might be the Employees table, with key ssn; we might
have tables for Secretary, Technician and Engineer also with keys ssn.
Multiple inheritance can be handled by having someone in the Employees, Technician and Engineer tables.
B. Create a separate table for each Si, including in each table all the
common attributes. This only works if subclassing is total; that is, if
every member of the parent class is in some subclass (why?). It becomes
inefficient of the Si are not disjoint.
C. Create a single table including all attributes of C and all the Si, and an additional type attribute indicating to which Si the record belongs. For example, we might have fields
fname, lname, ssn, address, type (secy, tech, eng), typing_speed, Tgrade, Eng_type, Eng_degree, year
The value of the type attribute
determines which of the remaining attributes are actually used.
Disjoint subclasses are necessary here, and space may not be used
efficiently. Multiple inheritance is not supported.
D. Like C, but instead use m Boolean attributes to indicate membership in each Si:
fname, lname, ssn, address, is_secy, typing_speed, is_tech, Tgrade, is_eng, Eng_type, Eng_degree, year
This mechanism can handle multiple inheritance reasonably well.
Second look at PHP PDO and LAMP (or WAMP)