Comp 353/453: Database Programming, LT 410, 4:15 Tuesdays
Week 5
Tutoring: Faraz Khan, Wed 5:00-7:00 pm, fkhan10@luc.edu.
Midterm: March 22
Triggers
Chapter 4 had two uses of the CHECK statement:
Attribute checks
These are applied whenever a specific attribute is added or updated
record (tuple) checks
These are applied whenever a particular row is updated or added.
You can do more.
Assertions
Assertions are general requirements on the entire database, to be maintained at all times.
create assertion salary_constraint
check ( not exists
(select * from employee e, employee s, department d
where e.salary > s.salary and e.dno = d.dnumber and d.mgr_ssn = s.ssn)
);
The problem with this (aside from the fact that this is not
always a good business idea) is when and where it is applied. After
every database update? But only some updates even potentially result in
violations: changes to employee salaries, and changes of manager.
From the manual:
PostgreSQL does not implement assertions at present.
Triggers
Due to the "expense" of assertions, some databases (eg oracle) support triggers: assertions that are only checked at user-specified times (eg update or insert into a particular table).
create trigger salary_violation
before insert or update of salary, super_ssn on employee
for each row when (new.salary >
(select salary from employee e
where e.ssn = new.super_ssn))
inform_supervisor (New.supervisor_ssn, new.ssn) -- some prepared external procedure
);
In postgres, we'd finish off with
for each row execute procedure inform_supervisor);
That is, we can't have the when clause.
Writing the external procedures is beyond the scope of SQL.
VIEWS
Here's a simple view, that is, a sort of "virtual table". The view is determined by the select statement within. The point, though, is not that we set works_on1 here to be the output of the select, but that as future changes are made to the underlying tables, the view is automatically updated.
create view works_on1 AS
select e.fname, e.lname, p.pname, w.hours
from employee e, project p, works_on w
where e.ssn = w.essn and w.pno= p.pnumber;
We can now do:
select * from works_on1;
select w.lname, sum(w.hours)
from works_on1 w
group by w.lname;
Now let's add an employee:
insert into employee values ('Ralph', null, 'Wiggums', '000000001', '1967-03-15', null, 'M', 22000, '333445555', 5);
insert into works_on values ('000000001', 1, 9.8);
Ralph is there (working on 'ProductX'). To delete:
delete from works_on where essn='000000001';
delete from employee where ssn='000000001';
Here's a typical use of views for a single table: hiding some columns (I've hidden salary for privacy reasons, and address for space):
create view employee1 AS
select fname, minit, lname, ssn, bdate, sex, super_ssn, dno from employee;
Finally, here's another example from the book, demonstrating column renaming and use of aggregation columns:
create view deptinfo (dept_name, employee_count, total_salary) AS
select d.dname, count(*), sum(e.salary)
from department d, EMPLOYEE e
where d.dnumber = e.dno
group by dname;
There are two primary implementation strategies for views:
- query rewriting
- materialization
It is typically a problem to update views, though we can do this for
the employee1 view so long as the missing fields do not have not null constraints (and also that we include the underlying employee table's primary key in the view).
Schema change
We've already seen ALTER TABLE in use: we used it to add foreign key
constraints after some initial values were entered. Another application
is for schema changes as the database needs evolve.
Oracle ROWNUM
Oracle allows the use of the "virtual" column rownum when working with sorted tables. This way one can print the third-highest value, for example (where rownum = 3), or the top 10 values (where rownum <=10).
Relational Algebra
We'll only do a quick look at this. The point is that SQL has a precise mathematical foundation; this allows the query optimizer to rewrite queries (ie to "prove" that alternative forms of some queries are equivalent).
First some set-theoretic notations:
Union: R1 ∪ R2, where R1, R2 are subsets of same cross product. This is logically like OR
Intersection: Logically like AND
Difference: Logical "R1 but NOT R2"
R1 − R2
The basic relational operations are:
PROJECTION: selecting some designated columns, ie taking some vertical slices from a relation. Projection corresponds to the select col1, col2, etc part of the query. It is denoted with π, subscripted with column names: πlname,salary(R).
SELECTION: Here we specify a Boolean condition that selects only some rows of a relation; this corresponds to the where part of a SQL query
- choosing subset of entries
- corresponds to logical operations defining the subset
- Sel(Parts:cost >10.0)
This is denoted by σ, subscripted with the condition: σdno=5 AND salary>=30000(R)
More examlples can be found on pp 147-149 of E&N.
PRODUCT: The product R×S is the set of ordered pairs {<r,s> | r∈R
and s∈S}. The product can be of two tables (relations) as well as
of two domains; in this case, if r=<r1,r2,...,rn> and
s=<s1,s2,...,sk>, then we likely identify the pair <r,s>
with the n+k-tuple <r1,r2,...,rn,s1,s2,...,sk>.
JOIN
The join represents all records from table R and table S where R.colN = S.colM
The traditional notation for this is R⋈S, or, when we wish to make the join condition explicit, R⋈colN=colMS. If the join column has the same name in both R and S, we may just subscript with that column name: R⋈ssnS.
The join can be expressed as Product,Selection,Projection as follows:
πwhatever cols we need to keep (σcolN=colM(R×S))
In other words, the join is not a "fundamental" operation; contrast this with the "outer join" below. As a suggestion for implementation,
the above is not very efficient: we never want to form the entire
product R×S. But this notation lets us restructure the query for later
optimization.
The equijoin is a join where the join condition involves the equality operation; this is the most common.
The natural join is a join
where we delete one of the two identical columns of an equijoin. That
is, if R is <ssn,lname> and S is <ssn, proj>, then the
equijoin is really <ssn,lname,ssn,proj>, where the two ssns are
equal, and the natural join is <ssn,lname,proj>.
Note that the outer join can not be
expressed this way; the outer join of two relations R and S is not
necessarily a subset of R×S (though it may be subset of R×(S ∪ {NULL}).
The outer join is fundamentally about allowing NULL values. If foo is the name of the join column in R, the outer join is of the form
R⋈S ∪ (σfoo∉π_foo(S) (R) × {NULL})
(Explain)
Note the condition here is strictly speaking outside the scope of the Boolean expressions at the bottom of page 147 of E&N.
Division: R ÷ S: As
above, we identify R with a relation T×S; R÷S is then {t∈T| ∀s∈S
<t,s>∈R} ("∀s∈S" means "for all s in S"). In English, which
records in T appear combined with EVERY row of S.
As an example, let E be the employees table, and P be the projects, and
W be the Works_on table, a subset of E×P. Then W ÷ P is the set of
employees who have worked on every project; W ÷ E is the set of
projects worked on by every employee. Note the implicit "closed world"
hypothesis: we're only considering project numbers (or employee
numbers) that actually appear in P and E.
Note the prerequisite that R is a table "extending" that of S: R
has all the columns that S has, and more. The underlying set of columns
of R ÷ S is the set (columns of R) − (columns of S). It is usually
easier to think of division as being of the form T×S ÷ S.
Asking which students got only A's is not quite an example of this (why?)
The book also defines a rename operation; for certain purposes this is important but I classify it as "syntactic sugar".
The book also uses an assignment operator (TEMP ← σcolN=colM(R×S)). However, we can always substitute the original expression in for the TEMP variable.
Query trees
See p 165
examples:
Note that solution is a SEQUENCE of steps
Double-join example:
Select e.lname, p.pname from employee e, works_on w, project p
where e.ssn = w.essn and w.pno = p.pnumber;
Entity-Relationship modeling
This is a variant (actually a predecessor) of object modeling (eg UML
or CRC cards or Booch diagrams). In the latter, everything is an
object. In ER modeling, we will make a distinction between entities (things) and relationships. As a simple example, students and courses are entities; but the enrolled_in table is a relationship. Sections most likely would be modeled as entities too, though there is a relationship to COURSE.
The ER process starts, like most software-engineering projects, with
obtaining requirements from users. What data needs to be kept, what
queries need to be asked, and what business rules do we build in? (For
example, if the DEPARTMENT table has a single column for manager, then
we have just committed to having a single manager for each department.)
After that, we identify the entities.
These should represent physical things, such as employees or parts or
(more abstractly) departments. Note that customer_orders might be
modeled as an entity, but might also be modeled as a relationship.
Entities have attributes, which will later more or less become the fields. For each attribute we have the following types:
- composite v single: a social-security number is a single
attribute; an address (consisting of street, apt, city, state, zip)
would be composite. So would a name.
- single-valued v multi-valued: E&N's examples here are college_degrees and vehicle_color.
- stored v derived: the classic derived attribute is age, derived from birthdate.
We also must decide which attributes can be NULL.
The book uses () to represent sub-attributes of composite attributes, and {} to surround multi-valued attributes.
Traditionally we represent the entity with a rectangular box, and the attributes are little oval tags.
An entity type is our resulting schema for the entity; the entity set is the actual set of entities.
In the diagram, we will underline the key attributes. If a key is composite, say (state,regnum), then we make a composite attribute out of those pieces.
This is a slight problem if the key can be either (state,regnum) or (state,license_plate); how could we best address this?
Note that key attributes really represent constraints.