Comp 353/453: Database Programming, Corboy 523, 7:00 Thursdays
Week 2, Jan 26
Read in Elmasri & Navathe (EN)
- Chapter 3, The Relational Data Model ...., section 3: Update Operations
- Chapter 4: Basic SQL
Homework 1
Due Fri, Feb 10
E&N chapter 4, page 112 exercises 4.7, 4.10abc and 4.12abcd.
For 4.7, note that there are seven referential-integrity (foreign-key) constraints listed:
1.
|
BOOK.Publisher_name |
⟶
|
PUBLISHER.Name |
2.
|
BOOK_AUTHORS.Book_id |
⟶
|
BOOK.Book_id |
3.
|
BOOK_COPIES.Book_id |
⟶
|
BOOK.Book_id |
4.
|
BOOK_COPIES.Branch_id |
⟶
|
LIBRARY_BRANCH.Branch_id |
5.
|
BOOK_LOANS.Book_id |
⟶
|
BOOK.Book_id |
6.
|
BOOK_LOANS.Branch_id |
⟶
|
LIBRARY_BRANCH.Branch_id |
7.
|
BOOK_LOANS.Card_no |
⟶
|
BORROWER.Card_no |
For 4.10 and 4.12, note that your queries should work even if more data
is added to the tables. Also, your answers should be in the form of a
single query; do not retrieve a value with one query and then manually plug that value into a second query.
All SQL should be entered in a format that I can copy and paste directly into a mysql window. In
particular, make sure there are no leading tabs or spaces, and that all
"prompt" characters such as "->" have been stripped out.
NoSQL example
The native database used by the Asterisk phone switch consists of
tables (actually called "families") that are ⟨key,value⟩ pairs. Neither
the key nor the value need be atomic, or for that matter the same type
as other entries. (This is done for simplicity, not because the DB is huge.)
Section 3.2: DB constraints
- inherent constraints
- schema-based (explicit) constraints
- NOT NULL
- primary key
- foreign key
- CHECK
- Triggers
- semantic constraints (business logic)
The COMPANY database
Schema: EN p 71
Data: EN p 72
basic table definitions
table definitions plus data, with ALTER
Spreadsheet
zip file
As stated above, a join is the operation of creating all records merged from two (or
more) tables, where one attribute of one table is required to be equal
to a corresponding attribute of another.
Examples:
University:
- Printing all of each student's grades, by joining the Student_number fields of STUDENT and GRADE_REPORT
- Printing all sections including Course_name, joining COURSE and SECTION on the Course_number field
Company:ind the project, controlling department and department-manager name, etc for every project located in Stafford
- Printing the name and address of all employees who work in the 'Research' dept (Query1 on EN p 100)
- Printing the project number, dept number, and the dept manager's
name, for all projects located in 'Stafford' (Query 2 on EN p 100)
- Printing each employee's name and his or her supervisor's name (Query 8, EN p 101)
A full Cartesian product would be denoted in SQL by, eg,
select * from employee, department;
where there is no WHERE clause establishing a relation between the two tables.
Note on join: it is conceptually somewhat inefficient. Lots of behind-the-scenes optimization makes it fast.
More on keys
A KEY is any set of columns that is guaranteed to uniquely determine a row. Notice that this is really a constraint about
row-insertion.
A table may have several keys. One of them is designated the primary key;
this is the key that the DBMS most likely uses for internal indexing
(and that the database developer thinks is the most important). It is
usually a single attribute, if there exists any such key. Frequently it
is a synthesized key; ie a unique serial number assigned by the DBMS.
Other keys are know as secondary keys or candidate keys.
Any key (primary or not) may be a composite key:
that is, consisting of multiple columns. Typically, tables
representing some kind of "transaction" or relationship between other
tables will have a composite primary key; the same is often true of
tables representing many-one relationships.
In the Company example, the
tables Works_on, Dept_locations, and Dependents all had a composite
primary key. The Works_on table represents "transactions" involving
employees and projects; the Dependents table key is composite because
each employee may have several dependents (and we do not necessarily
know the dependent SSNs), and the dept_locations key is composite
because one department may have several locations.
In the University example, the Grade_Report table had a composite primary key.
Note again that keys are not properties of particular tables, but
rather of the "table schema". They represent design constraints.
Foreign Keys
Key constraints are one kind of constraint. Foreign key constraints are another kind of constraint. What about the use of dno
in table Employees? It should be clear that we probably want all dnos
to refer to real departments, that is, to match an existing dnumber in table Department. This is done through a foreign key constraint: we declare in table Employee that attribute dno is a foreign key: a reference to a key of another table. The declaration looks like
foreign key (dno) references department(dnumber)
Note that the constraint here applies to adding (or updating) records in Employee, and also to deleting records in Department.
Foreign keys are notorious for introducing circularity problems. What
happens if we enforce foreign keys and try to load the database as
originally written? With all tables empty, we can't add any employee
because no dno value we might use would appear in the empty Department
table, and we cannot add a department because the mgr_ssn is a foreign
key referencing Employee.
In principle, there is no reason to require that the foreign key
actually be a key in the other table. In practice, it almost always is;
in database schemas generated through so-called Entity-Relationship
diagrams it always is.
Other constraints
Examples might be that the employee salary is in a given range, or is
less than the supervisor's salary, etc. These non-key constraints can
sometimes be addressed at the DB level, but are often easier to address
at the level of the user interface to the DB; that is, the web
interface can contain the necessary business logic.
Insert, Delete and Update
Examples:
insert into department values ('Sales', 6, '888665555', '2012-01-19');
update department set mgr_ssn = '333445555' where dnumber = 6;
delete from department where dnumber = 6;
See operations in section 3.3:
Inserts: must not violate key constraints or fkey constraints
Deletes: can never violate key constraints, only fkey constraints in another table.
Updates:
of key value; can violate key constraints, or fkey constraints in another table (eg changing dnumber)
of fkey value: can violate fkey constraints
non-key/fkey operations are safe
Demos
(There are more demos below in the Chapter 4 material)
insert into employee values ('ralph', 'j', 'wiggums', '123456798', NULL, NULL, 'M', 9999, '333445555', 7);
delete from employee where ssn = '123456798';
update employee set dno = 107 where ssn = '333445555'; // originally dno = 5 here
update employee set dno = 5 where ssn = '333445555';
Lists in Relational DBs
Suppose I want a table of ⟨instructor, list-of-classes⟩ and want to enter ⟨pld, [317,343,353]⟩
If you look through your oracle manual (let alone your MySQL manual), you won't find it.
Lists are non-atomic and are a problem.
The Relational way:
Create table CLASSES_TAUGHT: ⟨instructor, class⟩
Add entries
⟨pld, 317⟩
⟨pld, 343⟩
⟨pld, 353⟩
⟨sam, 101⟩
⟨sam, 202⟩
⟨sam, 303⟩
etc
What is the key here?
Lists are sometimes said to be multiple-valued entries; that is, in the
first case the value of the Courses column for row "pld" consists of
the multiple values 317, 343, and 353.
This doesn't make them any less problematic for relational databases.
Elimination of lists is basically the process of putting a table into FIRST NORMAL FORM (1NF).
A few complaints about "personal" DBs, like MS Access:
- no transaction-processing across network; instead, the whole DB is copied
- no table locking; allows for inconsistent updates
- client failure does not result in unlocking
- no log to ensure atomicity in a bunch of transactions (debit from account1, credit to account2)
Some of these may have been addressed by now.
Objects
By now, everyone does object-oriented programming. The basic relational model doesn't include that. Do we need it?
Some people think we do, and so there are "object-oriented" databases.
But note that method calls are tricky; SQL isn't about that.
Furthermore, we can usually simulate
the data extension portion of an object by adding a table. Suppose we
have a table PERSONS, and want to create a subobject STUDENTS, with
additional student-specific fields MAJOR and ENROLL_DATE. We can
accomplish this by creating a table STUDENTS, and filling it with
records of the form
⟨person_id, major, enroll_date⟩
where the person_id field is a reference to the remaining data about
the student in the PERSONS table. Mathematically, this STUDENTS table
looks like PERSONS × MAJORS × DATE, but with only a key to the PERSONS
table.
If you have a mysql window, and want to load up a file of SQL statements, use
source filename
If you have a file mydata.text consisting of tab-separated fields, with \N for NULL entries, you can load it as follows:
LOAD DATA LOCAL INFILE '/home/pld/453/mydata.text' INTO TABLE mytable;
You might also need LINES TERMINATED BY '\r\n'
Note that using "null" or "NULL" in place of \N does NOT have the desired effect. Also, '\n' is newline, not the same as '\N'
SQL (E&N chapter 4)
Scenarios where we use the join:
1. The "extension" case: table2 in
some sense extends information that could have been put in table1
except for redundancies. For each row in table1, find the unique
matching row in table2. The join column in this case is likely to be a
key in table2, and declared as a foreign key in table1. Example:
select e.fname, e.lname, d.dname from employee e, department d where e.dno = d.dnumber;
2. The "relationship" case: table2
defines some relationship; it has a dual-column key and the join column
is a key in table1 and one of the key columns in table2. Example:
select e.fname, e.lname, w.pno from employee e, works_on w where e.ssn = w.essn;
Actually, this case is exactly the same as the first, with table1 and
table2 reversed. Except that we tend to think of the employee table as
representing things (people), and the works_on table as representing relationships (who works on what).
We can also have multiple matches in table2, or zero matches. Multiple
matches occur in the second example (technically, multiple rows in
table1 in the first example match the row in table2 with dnumber=5).
create table: office example, with and without changes to allow foreign keys
We can also create a new table from an old one (or ones):
create table emp_names as select fname,lname,ssn from employee;
select * from emp_names;
drop table emp_names;
SQL constraints: Section 4.2
constraint specification
constraints can be given names; there is some debate as to whether this is helpful.
Four constraints:
- column type
- not null
- primary key
- foreign key
Foreign key constraints: what happens if we insert the dname,dnumber
for a department, then add employees, then do one of the following:
- delete the dept record
- update it, changing the dnumber from 4 to 40
From E&N p 95:
create table department(
dname varchar(20),
dnumber int,
mgr_ssn char(9) not null default = '888665555',
mgr_start date,
constraint dept_primary_key primary key (dnumber),
constraint dept_secondary_key unique (dname),
constraint dept_mgr_foreign_key
foreign key (mgr_ssn) references employee(ssn)
on delete set default
on update cascade
);
Named constraints mean that you will be told what constraint is violated. This
is less helpful than it seems. Named constraints also means that constraints can be deleted by name.
But consider the on_delete / on_update clauses. What these mean is that if you
delete an employee e from the employee db who is a dept manager (ie
that employee's ssn is used as department.mgr_ssn for some row), then
the mgr_ssn is set to the default value of 888665555. Furthermore, if
you update an employee entry to correct a department manager's ssn, then that corrected value is cascaded into the appropriate row(s) of the department table.
The SQL select-from-where statement: 4.3
select columns from tables where boolean condition selecting rows
The tricky part is that boolean conditions can involve joins.
Also, note that SQL is a "nonimperative" language: it has
(essentially) no assignment operator. (Ok, you can save tables as
intermediate results, but you should not do that.)
Insert
insert into tablename values (f1val, f2val, ... , fnval);
You can also name columns:
insert into employee(fname, lname, ssn, dno) values ('john', 'smith', 345678912, 4);
Finally, instead of values you can provide a select query that returns
a set of rows from some existing table, with matching "type signature"
(matching column types).
Inserts can violate all four constraints.
insert into employee(fname,lname,ssn,dno)
values ('robert', 'Hatcher', '456789123', 2);
insert into employee(fname,lname,dno)
values ('robert', 'Hatcher', 5);
Update
Basic form:
update employee
set salary = salary*1.10
where dno = 5;
Updates can also violate all four constraints, though the not null
constraint can be violated only by setting a column to null, and the
primary key constraint can only be violated by updating the primary
key. Foreign key constraints are relatively easy to violate, though.
Delete
delete from tablename where boolean condition selecting rows
Here, we can only violate foreign key constraints, eg by deleting from table department the department of some existing worker.
SQL data types
- INT (etc)
- VARCHAR(n)
- Boolean
- DATE (& TIME, etc)
- blobs
SQL examples
Some queries are modified to fit the Office database, from those of Ramakrishnan & Gehrke 2002; the others are from E&N.
Find all employees with salary >= 30000
select * from employee where salary >= 30000;
select e.fname, e.lname, e.salary from employee e where e.salary >=30000;
Note my use of the e name. I recommend this style for readabililty. You can think of e as a variable that ranges over all the rows, though it looks syntactically more like a table variable.
Query 2 of E&N
For every project located in Stafford, list the project number, the
controlling department number, and the department manager's lname,
address, bdate.
Solution as written:
select pnumber, dnum, lname, address, bdate
from project, department, employee
where dnum = dnumber and mgr_ssn = ssn and plocation = 'Stafford';
Note that this is a dual-join example.
Compare with the version with names:
select p.pnumber, dnum, e.lname, e.address, e.bdate
from project p, department d, employee e
where p.dnum = d.dnumber and d.mgr_ssn = e.ssn and p.plocation = 'Stafford';
Find the list of supervisor ssns (that is, omit duplicates)
select distinct e.super_ssn from employee e;
Looking at the output, it might be good to add "where e.super_ssn is not null"
Find all employees who have worked on project 2
(Note that in our works_on table, no employee works on the same project
more than once. Compare the solution to obtaining a list of employees
who have worked on project 2, where the table is works_on_by_week:
⟨essn, pno, week, hours⟩
select e.fname, e.lname from employee e, works_on w where e.ssn = w.essn and w.pno = 2;
Note this is a join example (not our first). We could also have written it:
select e.fname, e.lname from employee e join works_on w on e.ssn = w.essn where w.pno = 2;
Find all employees who have worked >20 hours on a project
select e.fname, e.lname, w.pno, w.hours from employee e, works_on w
where e.ssn = w.essn and w.hours >20;
E&N Query 4: List all project numbers of projects involving employee 'Smith', either as manager or worker
Note that projects are managed by the manager of the project's controlling department.
(select distinct p.pnumber
from project p, department d, employee e
where d.dnumber = p.dnum and d.mgr_ssn = e.ssn and e.lname = 'Smith')
union
(select distinct p.pnumber
from project p, works_on w, employee e
where p.pnumber = w.pno and w.essn = e.ssn and e.lname = 'Smith');
Smith, however, hasn't managed any projects. It works better for Wong:
set @NAME = 'Wong';
(select distinct p.pnumber
from project p, department d, employee e
where d.dnumber = p.dnum and d.mgr_ssn = e.ssn and e.lname = @NAME)
union
(select distinct p.pnumber
from project p, works_on w, employee e
where p.pnumber = w.pno and w.essn = e.ssn and e.lname = @NAME);
E&N Query 12, Find all employees with address in Houston, TX
To do this, we have to use substrings and the LIKE operator:
select e.fname, e.lname from employee e
where e.address LIKE '%Houston TX%';
Note the uncertainty; if someone put two spaces in, we're in trouble. E&N did use commas; I did not.
Query 12A: find all employees born in the 1950's.
E&N way: too clever by half:
select e.fname, e.lname from employee e where e.bdate LIKE '__5_-__-__';
For one thing, Oracle has a different standard date format than MySQL. (I added dashes to my pattern, hopefully for clarity.)
Better:
select e.fname, e.lname from employee e
where e.bdate between '1950-01-01' and '1959-12-31';
E&N Query 14: Find employees with salary between $30,000 and $40,000
select e.fname, e.lname from employee e
where e.salary between 30000 and 40000;
E&N Query 15: Introduces order by (also a triple-join)
Give a list of employees and the projects they are working on, ordered
by department, and, within departments, ordered alphabetically by
lname, fname.
select d.dname, e.lname, e.fname, p.pname
from department d, employee e, works_on w, project p
where d.dnumber = e.dno and e.ssn = w.essn and w.pno = p.pnumber
order by d.dname, e.lname, e.fname;
Names of employees who have worked on project 2 or project 3:
select e.fname, e.lname , w.pno from employee e, works_on w
where e.ssn = w.essn and (w.pno = 2 or w.pno =3);
Alternatively:
(select e.fname, e.lname, w.pno from employee e, works_on w where e.ssn = w.essn and w.pno = 2)
union
(select e.fname, e.lname, w.pno from employee e, works_on w where e.ssn = w.essn and w.pno = 3);
Names of employees who have worked on project 2 and project 3. This one is harder! If we change the or above to and, we get nobody.
select e.fname, e.lname, w.pno from employee e, works_on w where e.ssn = w.essn and w.pno = 2;
+----------+---------+-----+
| fname | lname | pno |
+----------+---------+-----+
| John | Smith | 2 |
| Franklin | Wong | 2 |
| Joyce | English | 2 |
+----------+---------+-----+
select e.fname, e.lname, w.pno from employee e, works_on w where e.ssn = w.essn and w.pno = 3;
+----------+---------+-----+
| fname | lname | pno |
+----------+---------+-----+
| Franklin | Wong | 3 |
| Ramesh | Narayan | 3 |
+----------+---------+-----+
From looking at these, it is clear that Franklin Wong has worked on both.
Attempt 1:
(select e.fname, e.lname, w.pno from employee e, works_on w where e.ssn = w.essn and w.pno = 2)
intersect
(select e.fname, e.lname, w.pno from employee e, works_on w where e.ssn = w.essn and w.pno = 3);
Alas, MySQL does not support the INTERSECT operator. (Also, we should get rid of the w.pno column in the output!
Attempt 2:
select e.fname, e.lname from employee e, works_on w1, works_on w2
where e.ssn=w1.essn and w1.pno = 2 and e.ssn=w2.essn and w2.pno = 3;
This works.
Names of employees who have worked on more than one project
select e.fname, e.lname from employee e, works_on w1, works_on w2
where e.ssn = w1.essn and e.ssn = w2.essn and w1.pno <> w2.pno;
This could use a DISTINCT.
It could also be done by creating a COUNT of the number of projects
worked on by each employee, and then printing those employees for which
COUNT >= 2.