Database week 6

Comp 353/453: Database Programming, Corboy 523, 7:00 Thursdays

Week 6, Feb 23

Read in Elmasri & Navathe (EN)

Chapter 3, The Relational Data Model ...., section 3: Update Operations
Chapter 4: Basic SQL
Chapter 5:

Midterm: March 15

Homework 2 updated rules:

Due Sunday, Feb 26, 2012 (was Feb 24)

All submissions must be in text format, WITHOUT UNICODE (especially unicode quotation marks)! Use only standard ASCII quotation marks ' and ", as needed. Be aware that if you paste your work into a MSWord file, quotes may be changed to unicode.

All SQL queries should be "copy-and-paste ready"; that is, any extraneous marks (such as "->" or other prompts) should be edited out. Leading spaces are ok; however, tabs have strange consequences and you should try to avoid them.

When a query is requested, your answers should be in the form of a SINGLE query; do not retrieve a value with one query and then manually plug that value into a second query.

First look at PHP PDO and LAMP (or WAMP)

Try removing the ";" from the include statement in pdo1.php to admire the elegance and precision of the resultant error message.

A brief word on auto-commit

To commit your SQL updates is to write the changes to the permanent database; in this sense, it is like save. So far we've assumed that there is a commit operation performed after every insert or update; this situation is called auto-commit. Generally, auto-commit mode is an attribute of your database connection; if auto-commit is true then a commit is performed after every SQL statement that potentially alters the database.

The alternative to auto-commit is to execute a group of updates, and then explicitly invoke the commit operation at the end, to commit all the updates together. A group of SQL statements between consecutive commits is then called a transaction; all the statements of the transaction are committed together. Usually, though, we want a stronger assurance: that all the statements of the transaction either succeed, or none of them do (this is implicit if we know that commits always succeed, but this is not the case in the real world). This is known as the atomicity requirement, the first part of the ACID test (atomicity, consistency, isolation, durability). The idea is that a transaction should be atomic, that is, indivisible: the individual queries that make it up should be executed as a unit.

Instead of a commit, a user may also issue a rollback, which means to throw away all the actions back to the previous commit, thus discarding the transaction.

We'll stick with auto-commit for a while longer, but be aware of two things:

This is why there is no "save" operation
auto-commit is not universal; sometimes you need manual control

Entity-Relationship modeling

Here's a summary for the construction of entities:

Look for the "concrete" objects in the problem domain
List the attributes of each entity.
Break compound attributes down into atomic attributes
Attributes can, at this stage, be multivalued
Indicate the (single-attribute) key for each entity
Do not use other entities as attributes; model this instead at the relation stage
This may leave some entities ("weak" entities) without a complete key. Just mark them as such
Weak entities will be tied to some other entity through the defining relationship.

Relationships

Initially we arrive at Fig 7.8, with four entities: DEPARTMENT, PROJECT, EMPLOYEE, DEPENDENT. Note that Works_on here is shown as an EMPLOYEE attribute; it could also be represented as a PROJECT attribute. How are we representing department membership? Who works on what? Who is in charge of what projects?

Note some of the attributes in figure 7.8 refer to other entities. These are our first relationships; these will likely end up translated into foreign key constraints.

A relationship formally is a set of ordered tuples ⟨e1,e2,...,en⟩ where each ei is a member of entity Ei. Some entities here may simply be attributes (eg the hours attribute of the WORKS_ON relationship ⟨employee,project,hours⟩.

The tuples in a relationship must each have a clear meaning to the application. Relationship names are usually verbs, and should make sense "left to right" (and sometimes top to bottom). That is, we would prefer the relationship name supervises because it fits in with
SUPERVISOR----- supervises ------EMPLOYEE
We could also use
EMPLOYEE ----- reports_to ------ SUPERVISOR

Most relationships are binary (possibly with added attributes); ternary and higher-degree relationships are less common.

At this stage, we may model a relationship as a (typically multivalued) entity attribute; consider again how we modeled WORKS_ON in Figure 7.8.

When a relationship involves multiple entities, we can assign a role name to each entity. Commonly this is just the name of the entity (eg EMPLOYEE), but in relationships between an entity and itself (so-called recursive relationships), we have to use different names. Consider the example of the SUPERVISES relationship.

Example: fig 7.11; note that the righthand SUPERVISION oval contains references to pairs of entities in the lefthand EMPLOYEE oval.

For entities, it is often the case that we elect to use synthetic keys: arbitrarily generated "ID numbers". This makes sense for departments and employees. Relationships, however, typically have a natural key consisting of one primary key from each entity; using synthetic keys (eg order numbers) should stand out. A good example of this is the GRADE_REPORT table, indexed by student_number and section_identifier (and with attribute grade).

How should we model SECTION in the school database? We did model it as an entity, but could we model it as a ternary relationship between course, semester, and instructor? No, if we allow an instructor to teach two sections of the same course in the same semester.

What about an INVOICE? This consists of a number of ITEMs, each with quantity, ordered by a single CUSTOMER. We can create a relationship ORDERS between CUSTOMER and ITEM, but an invoice is more than that. If a customer places multiple orders on the same day, the customer likely expects them to remain different. So instead we would have an entity for INVOICE, with attributes invoice_number (synthetic), and date, and customer, and then create a relationship ORDERS between INVOICE and ITEM, with attributes for price and quantity:

invoice	item	price	quantity
1002	37	$5	6
1002	59	$3.45	2
1003	101	$1300	1

Cardinality

Binary relationships can be classified as 1:1, 1:N, N:1, or M:N. In the WORKS_FOR relationship, between DEPARTMENT and EMPLOYEE, this is 1:N. Each employee works for 1 department, but a department can have multiple employees. (Again, the 1 here in 1:N represents a constraint; the N represents no constraint. It is not actually required that all departments have multiple employees.)

The MANAGER relationship is 1:1 (though see the note): every dept has one manager and vice-versa. This is a 1-1 relationship between EMPLOYEE and DEPARTMENT. Note that most employees are not managers; this does not change the fact that no employee manages two departments. See Fig 7.12 for a diagram representing this.

Note: that the MANAGER relationship is 1:1 expresses a business rule: no employee manages more than one department, and no department has two managers. The latter is pretty universal; the former, while common, is not.

Many relationships are 1:N (one-to-many):

    DEPARTMENT ----1--- employs ----N----- EMPLOYEE (or employee works_for department)
    EMPLOYEE -----1----- supervises ----N------EMPLOYEE (boss is on left side)
    DEPARTMENT ----1---- controls-----N------PROJECT

Think of "1 department = N employees"; the 1 goes on the side that the other entity can have only 1 of. The 1 goes on the "larger" unit: a department is made of N employees, a boss supervises N employees, a department controls N projects.

See Fig 7.9.

The supervises relationship is "recursive" (a better word, used in the UML community, is "reflexive"). See figure 7.11 for a diagram.

The WORKS_ON relationship is M:N.
Similarly, the enroll relationship is M:N
STUDENT -----M----- enrolls ----N----SECTION
A section may have several students; each student may enroll in several sections.

See fig 7.13.

What do we do if, after we've gotten started, we decide that the location attribute of a DEPARTMENT should be multi-valued? We can model multi-valued attributes as relationships instead:

    DEPARTMENT ----N----is_located_at-----M----LOCATION

Clearly, we would not want this to be 1:M, which would mean that a location could be used by only one department. If we do decide that departments have single locations, we go back to an N:1 relationship:

    DEPARTMENT ----N----is_located_at-----1----LOCATION

Participation constraints on relationships

Suppose every employee must work for some department. Then the WORKS_FOR relationship involves total participation of the EMPLOYEE entity. The MANAGES relationship involves partial participation of the EMPLOYEE entity, at least as far as supervisors are concerned.

We represent total participation by a double line, and partial by a single line.

Relationships can have attributes; eg hours of WORKS_ON or grade for the GRADE_REPORT table.

As was described above, entities usually have a single (possibly composite) key; entities are often given a synthetic key (ie an employee_id or student_number). Relationships typically have a key with as many attributes as the degree of the relationship. Synthetic keys are often awkward for these.

The key to a relationship should be a composite of the keys to each entity. Otherwise the relationship is not just about the two entities involved.

Note that synthetic keys work very well for joins.

Now we should be able to go through Figure 7.2 (E&N p 204) in detail. The relationships are supervises, works_for, manages, controls, works_on, and dependents_of. Note that the name "supervision" is awkward; it is not clear who is supervising whom. As a result, the entity links need annotation with the role names "supervisor" and "supervisee". However, such annotation is often a good idea for clarity.

(The figure below was Fig 3.2 in an earlier edition of E&N; it is Fig 7.2 in the 6th edition.)

ER diagram for the COMPANY database

Sometimes, as we rethink things, an attribute can be changed to a relationship, or vice-versa. Sometimes an attribute may be promoted to an entity, particularly if it was used in several other entities, in which case we may also add a relationship to those other entities.

Relationship attributes can sometimes be moved to entities. For a 1:1 relationship, the relationship attribute can be moved to either entity. For a 1:N relationship, the relationship attribute can be moved to the N side. Consider the earlier examples:

    DEPARTMENT ----1--- employs ----N----- EMPLOYEE    attribute: start_date, etc
    EMPLOYEE -----1----- supervises ----N------EMPLOYEE    attribute: review_date
    DEPARTMENT ----1---- controls-----N------PROJECT       attribute: project_budget_num

Sometimes we have entity attributes that need to be translated into relationships. See Section 7.6. We would move manager information from the DEPARTMENT entity to the MANAGES relationship. We started out with manager as an attribute of departments, but later realized that there was a relationship involved because two entities were involved: DEPARTMENT and EMPLOYEE. This suggests the need for a relationship.

We would move controlling-department information from the PROJECT entity to the CONTROLS relationship. We would remove department, supervisor, and works_on from EMPLOYEE. Note that some of these will eventually be added back. At this point, we should have eliminated most multi-valued attributes.

ER diagram for the STUDENT database
Entities: student, course, section

(min,max) annotation

Instead of labeling lines connecting a relationship to an entity with 1, M, or N, we can also use a (min,max) notation, meaning that each entity e in the entity set E must participate in at least min entries of the relationship, and at most max. If min>0, the participation is total; min=0 means partial participation. The max is denoted N when we mean it is allowed to be >1.

Note that a 1-N relationship would have the values reversed using the (min,max) notation:
DEPARTMENT ----1--- employs ----N----- EMPLOYEE
DEPARTMENT ---(1,N)--- employs --- (1,1)----- EMPLOYEE

Example: Fig 7.15

UML diagrams

See Figure 7.16. UML diagrams have space for operations,which in the world of databases we're not much concerned about. The big boxes are for entities; relationships have been reduced to boxes that annotate links. A (min,max) notation is used, but the label goes on the opposite entity.

UML relationships (actually, ER relationships as well) may either be of association or of aggregation. The latter implies a collection, eg of employees into one department.

How do we translate this to tables?

We'll go into more detail later, but for now, note that a 1:1 relationship can be represented as an attribute of either entity. A 1:N relationship can be modeled as an attribute of one of the entities (the entity on the side of the N). M:N relationships must get their own table.

ER-to-relational mapping

How do we build a database schema from an ER diagram?

Step 1: regular entities
We define a table for each non-weak entity. We use all the leaf attributes; composite attributes are represented by their ungrouped components. Keys are also declared. Attributes that were earlier pushed into relationships are not yet included.

Step 2: weak entities
We create a table for each weak entity, adding the keys for the owner entity type (or types) (this would mean employee ssn), and adding a foreign key constraint to the owner-entity table.

We are likely to use the CASCADE option for drop/updates: if an employee ssn is updated, then the dependent essn must be updated, and if an employee is deleted, then all the dependents are deleted too.

Step 3: binary 1:1 relationships
Let S and T be the participating entities to 1:1 relationship R. We pick one of the two -- say S -- and add to S a column that represents the primary key of T, and all the attributes of R.

It is better to choose as S the entity that has total (or at least closer to total) participation in R. For example, the manages relationship between departments and employees is 1:1, but is total only for DEPARTMENT, and is nowhere near total for EMPLOYEE. Thus, we add a column manager to DEPARTMENT. However, adding a column manages to EMPLOYEE would work.

We also add a foreign key constraint to S, on the new attribute, referring to the primary key of T.

One alternative is to merge S and T into a single relationship; this makes sense only if both have total participation in R. This means that S and T each have the same number of records, and each record s in S corresponds to exactly one t in T.

A third alternative is to set up a table R containing <sk,tk> key pairs.

Step 4: binary 1:N relationships
Let us suppose S---N---R---1---T. We now add T's key to S as an attribute with foreign-key constraint. We must add T's key to S; we cannot do it the other way around. In the relationship
DEPARTMENT ----1--- employs ----N----- EMPLOYEE
we would have S be EMPLOYEE; we would put a dno column in EMPLOYEE (why can't we add an essn column to DEPARTMENT?)

An alternative is the <sk,tk> keypair table. This might be more efficient if only a few s in S participate in the relationship; otherwise we would have many NULLs in the T-column of S.

Step 5: binary M:N relationships
Here we must create a table R of tuples including the key of S (sk), the key of T (tk), and any attributes of R; we can not push the data into either S or T. Call the new table also R (note that E&N call it S). The sk column of R should have a foreign key constraint referring to the key column of S, and the tk column of R should similarly have a foreign key constraint to the key column of T.

The WORKS_ON table is a canonical example; so is the GRADE_REPORT table.

Again we would likely to use the CASCADE option for deletion or update of records in the participating entities S & T.

Step 6: multivalued attributes
If we have any left, they must be moved into their own tables. For example, if employees can have several qualifications (eg degrees or certifications), we would create a table QUALIFICATION with two columns: essn and qualification. The DEPT_LOCATIONS table is similar. Again, we would have an appropriate foreign key constraint back to the original table.

Step 7: higher-degree relationships
These are handled like binary M:N relationships.

More on Foreign Keys

Here's the seven-step ER-to-relation algorithm again, slightly simplified:

create a table for each regular entity
create a table for each weak entity, adding the key field from the owner entity as a foreign key for the new entity.
for binary relationships between entities E1 and E2, pick one of them (eg E1) and add to it a field conntaining the key to E2. Make this a foreign key in E1.
for binary 1:N relationships between E1 and E2, E1---1---R---N---E2, add a column to E2 containing the key of E1. Make this a foreign key in E2.
For binary N:M relationships between E1 and E2, create a new table R consisting of ⟨E1.key, E2.key, R.attributes⟩. Make E1.key and E2.key foreign keys in R.
For multivalued attributes of entity E, create a new relation R. One column of R will be E.key; this should be a foreign key in R.
ternary and higher-degree relationships: like step 5.

Joins arise in steps 2, 3, 4, 5, 6, and 7, for recovering the original relationships (or attribute sets for 6, or entities for 2). In every case, the join field is a key of one relation and a foreign key in the other.

Not all joins are about recovering relations from an ER diagram.

Also, I said earlier that entity T should not have an attribute that was another entity of type S; instead, we should create a relationship R between T and S. If S was at all a candidate for an attribute, each T would be related to at most one S and so this would have cardinality constraint T---N---R---1---S. Then, when we did the above conversion, in step four we would add S's key to T with a foreign key constraint referring to S.

But suppose we did add S as an entity attribute to T. Then we would end up with the same situation: we would use the key of S as an attribute of T, and create the same foreign-key constraint. So in the end we get the same thing.