alter table project ADD constraint
FK_project_dept_locations foreign key (plocation) references
dept_locations(dlocation);
alter table project drop foreign key
FK_project_dept_locations;
-- to undo
Consider
our
earlier join example:
select e.fname, e.lname, d.dname from employee e, department d where e.dno = d.dnumber;
As mentioned last week, one can think of e and d as "cursors", or row variables, representing rows in the respective tables employee and department. Linear search through the employee table , as in the SQL example
select e.lname, e.salary from employee e where e.salary >= 30000;
might be represented by the following java-like loop, showing a linear search through the table:
for (e
in employee) {
if (e.salary >= 30000) {
print(e.lname, e.salary)
}
}
Now let's write this for the above join; here we get a nested loop and quadratic runtime (specifically, O(nm), where n=employee.size() and m=department.size())
for (e
in employee) {
for (d in department) {
if (e.dno == d.dnumber) {
print(e.fname, e.lname, d.dname)
}
}
Clearly, for triple joins over large databases, performance is an issue. This is addressed two ways: query optimization, to eliminate "obvious" inefficiencies, and internal indexes. The primary key defines a candidate for indexing, though not all indexing is on key fields.
Without a join, all you can look at is one row at a time of one table.
Without a join, all you can look at is one row at a time of one table. When we join EMPLOYEE and DEPARTMENT, eg
select e.lname, d.dname from employee e, department d where e.dno = d.dnumber;
we are looking at one row at a time of each table, but two tables. The DEPARTMENT table is being used here to "extend" the EMPLOYEE table: each employee record now has additional attributes from DEPARTMENT.
Suppose we want to print a list of all employees and their supervisors. Here's the approach we might take:
select e.fname, e.lname, s.fname, s.lname
from employee e, employee s
where e.super_ssn = s.ssn;
The idea here, in terms of cursors, is for e to traverse linearly the employee table. For each employee, we get e.super_ssn, and then use s to traverse the same table, looking for a row with s.ssn = e.super_ssn. Each record in the result is a combination of two rows of table EMPLOYEE, so a join is necessary.
In terms of table extension, the EMPLOYEE table is being "extended" to
include additional attributes about supervisors besides just super_ssn.
This particular example is much more readable if we relabel the supervisor
columns in the output
select e.fname, e.lname, s.fname AS
super_fname, s.lname AS super_lname
from employee e, employee s
where e.super_ssn = s.ssn;
Note that the AS here is not optional, and plays a rather different role than the AS that can be used in the FROM section (from employee AS e, employee AS s).
In many cases, the where
conditions are either join conditions or are simple Boolean expressions
about a single row. Sometimes, however, the question being asked relates
to other rows, as in "List employees who worked on more than one project";
if we form the join table of employees and their projects we still cannot
answer this row-by-row. Another query might be to list employees who are
part of some set of employees whose salaries add up to exactly 100,000.
The answer last week to the employees-who-worked-on-multiple-projects
query was a "triple" join with inequality as one of the join conditions:
select e.fname, e.lname from employee e,
works_on w1, works_on w2
where e.ssn = w1.essn and e.ssn = w2.essn and w1.pno <> w2.pno;
Most joins (but not this one!) are equijoins; that is,
joins involving an equality relationship. Inequalities can be harder to
understand. Recall that the above version introduces many duplicates, one
for each way of choosing two unequal projects. "Select distinct" is
better.
Actually, this particular example is easier done using a subquery
and the count operator (which we'll cover below):
select e.fname, e.lname from employee e
where (select count(w.pno) from works_on w where w.essn = e.ssn) > 1
In general, WHERE conditions that just select rows of a table satisfying a given Boolean condition are fundamentally trivial. Joins in WHERE conditions are a little less trivial, and WHERE conditions that select a record (or part of a record) depending on other records in the table tend to be the hardest. Example: list all students who got straight A's; this lists the student_id values from grade_report where all of the matching rows have an A in the grade column. This last category often involves grouping, or subqueries, or some other advanced feature.
Basically, it's hard because it is a non-imperative language:
you don't issue a sequence of commands. Instead, you write a single
big expression. It's hard to visualize big expressions, and hard to
test them incrementally. Nested queries do seem to help in this regard.
(Actually, the data types of SQL are tables, and you can in
fact save a table in a "table variable". But this is rare.)
Perhaps because of this non-imperative nature of the language, it is
relatively easy to read several SQL examples and still have trouble
knowing where to start when trying to write one from scratch.
In all the examples presented above, one of the join attributes is the primary key of the table it appears in. Here are most of those examples again, where for each join's equality condition the attribute that is the primary key of its table is in bold.
select e.lname, d.dname from employee e,
department d
where e.dno = d.dnumber;
select distinct p.pnumber
from project p, department d, employee e
where d.dnumber = p.dnum and d.mgr_ssn = e.ssn and
e.lname = 'Smith'
select distinct p.pnumber
from project p, works_on w, employee e
where p.pnumber = w.pno and w.essn = e.ssn and
e.lname = 'Smith'
select d.dname, e.lname, e.fname, p.pname
from department d, employee e, works_on w, project p
where d.dnumber = e.dno and e.ssn =
w.essn and w.pno = p.pnumber
select e.fname, e.lname , w.pno from employee e, works_on w
where e.ssn = w.essn and (w.pno = 2 or w.pno
=3);
select e.fname, e.lname
from project p, works_on w, employee e
where p.plocation = 'stafford' and p.pnumber =
w.pno and w.essn = e.ssn;
When this is done with two tables, the other table (with join attribute that is not a key) can be thought of as being extended. For each record in the other table, we use the join attribute to look up a single record from the first table (using the primary key).
Note that in most cases only one join-condition
attribute is a primary key; the other though is a foreign key.
Are there any cases where neither of the join's attributes is a key of its
relation? Mathematically this can certainly happen; in fact, a join
condition does not even have to be based on an equality comparison. But
coming up with a meaningful example is a different
thing. Here's the best I could do. Suppose we want a list of all employees
who live in the same town as some department's office. We might want
such a list, for example, for emergency preparedness. We can match the
employee city field with the dept_locations dlocation field; neither of
these is a key. Actually, in the Company database we have to use the
employee address field, and do substring matching. This is what I got:
select distinct e.ssn, e.lname, e.address from
employee e, dept_locations dl
where e.address LIKE concat('%', dl.dlocation, '%');
I used distinct because an employee in Houston would otherwise be listed for both departments 1 and 5 (both of which have offices in Houston).
In the 1970's, the Ingres database project began at Berkeley. Ingres
originally stood for Interactive graphics
retrieval system, though shortly
thereafter the "graphics" part became much less important.
After Ingres (which does still exist) came Postgres, for "post-ingres".
Later, to emphasize its support for SQL, Postgres became PostgreSQL.
Note that it is virtually impossible to pronounce PostgreSQL using the
"sequel" form for SQL; if you want a short form, use "Postgres".
A decade ago, MySQL was the "fast one" and Postgres was the one that
accurately implemented the SQL standard. That's less true now, though
MySQL is still missing intersection (and a few other
things).
Postgres can be found at http://postgresql.org.
After you install the software, you still have to create appropriate
accounts; the postgres account is the only one
installed by default. Here's what I had to do:
1. Log in as user "postgres", and run the psql command.
The "bash>" prompt below is the shell prompt (as user postgres); the
"post=> " prompt is the psql SQL prompt.
bash> sudo -u postgres psql
post=> create user 'pld';
2. From the shell, still as user postgres
bash> createdb pld
Then, as user pld, the command "psql" works. If you created several
databases above, eg "pldcompany", then the command to connect to postgres
using the pldcompany database would be psql pldcompany.
I did not actually figure out how to give myself the right to create new
databases.
Your psql prompt should show you the current database.
SHOW ALL -- shows status variables only
\d
same as mysql show
tables;
\d employee same as mysql describe
employee;
Here's another way to list your tables:
select table_name from information_schema.tables where table_schema='public' and table_type='BASE TABLE';
This now actually works, under postgres, but not in the following form:
old:
(select e.fname, e.lname, w.pno from
employee e, works_on w where e.ssn = w.essn and w.pno = 2)
intersect
(select e.fname, e.lname, w.pno from employee
e, works_on w where e.ssn = w.essn and w.pno = 3);
What is the problem?
new:
(select e.fname, e.lname from employee e,
works_on w where e.ssn = w.essn and w.pno = 2)
intersect
(select e.fname, e.lname from employee e, works_on w where e.ssn = w.essn
and w.pno = 3);
Above, we did this as follows:
select e.fname, e.lname from employee e,
works_on w1, works_on w2
where e.ssn=w1.essn and w1.pno = 2 and e.ssn=w2.essn and w2.pno = 3;
The intersect operator itself would have been
trivial to implement in MySQL, but that is only part of the story. It is
trivial to implement the intersect operator in
order to get the correct resuts. It is nontrivial to
implement it in a way that allows the system as a whole to do
reasonable-quality query optimization.
Generally, intersections are easy to implement as joins, and the join
approach may be more efficient. As a general rule, an intersection of the
form
(select a.col1, a.col2 from TableA a)
intersect
(select b.col1, b.col2 from TableB b)
is equivalent to the following (which may be a self-join if TableA and TableB are the same table):
select a.col1, a.col2 from TableA a join TableB b on a.col1 = b.col1 and a.col2 = b.col2
In some cases only one of the two join conditions may be necessary; if
col1 is a primary key, for example, then you can drop the col2 comparison.
The Company query above, where we find employees who have worked on
project 2 and on project 3, is of this general form, though we only needed
the employee table once in the join. TableA corresponds to employee join
works_on w1, and TableB corresponds to employee join works_on w2.