Comp 305-001/488-305: Database Administration
Spring 2016: Tuesdays, 4:15-6:45, Corboy L08
Week 3
Read in Elmasri & Navathe (EN). Chapters are labeled (7th edition)/(6th
edition)
- Chapter 1/1, Databases and Database Users (skim)
- Chapter 2/2, Database System
Concepts an Architecture, Section 1 on Data Models, Schemas and
Instances
- Chapter 5/3, The Relational Data Model and SQL Sections 1
& 2
- Chapter 6/4, Basic SQL
Homework 1
EN7 Chapter 6, p 203 / EN6 chapter 4, page 112: exercises
10abc and 12abcd
These exercises appear in full on Sakai, in the Resources folder.
Due: Thursday, Feb 11, on Sakai
Rules:
Your queries should work even if additional data is added to the tables.
Every table referred to in the FROM clause should be named with a "table
alias", that is:
select e.lname, d.dname
from employee e, department d ...
All answers should be in the form of a single query; do not retrieve a value with one query and
then manually plug that value into a second query. Similarly, data appearing
in the query should be from the exercise itself, and not from a "visual"
lookup.
All SQL should be entered in a format that I can copy and paste directly
into a command window. In particular, make sure:
- All "prompt" characters such as "->" have been stripped out.
- You use regular quotation marks, not special unicode quotation marks.
Unicode-quoted ‛foo’ is not the same as 'foo'. (A sure way to convert
inadvertently to unicode is to paste your work into a Word document; try
installing Notepad++.)
- There are no leading tabs or spaces
From
last week:
1. Figuring out joins: often an ER diagram helps, in which each table's
attributes are listed. One can then draw a line between the joined
attributes.
2. table aliases vs column aliases: very different things
3. Review other queries
4. count(*) and other aggregation functions
This works (well, ran without error) under MySQL:
select lname, max(salary) from employee;
It does not work under postgres, and should not!!
The rule: any non-aggregate attributes (eg lname above) must appear
in a GROUP BY clause. We'll get to that.
4. Review other queries
A few more basic concepts
Database tables are usually "self-describing", in that the table description
and/or relationships to other tables is often embedded in the table
description. At a minimum, columns have names and types.
The table definitions, column names and their types are generally known,
collectively, as the database schema.
Database tables can be changed without the need to recompile programs using
that table: new columns can be added, or entire new tables. Essentially this
is because DBMS queries are interpreted,
and table columns are identified by name
rather than offset. (C programs that access record fields do
need recompilation if fields are changed.)
Databases often support different views,
perhaps for users with different privilege levels. A view may be a subset of
the original set of columns, or it may contain some computed columns in lieu
of the original columns. Excluding columns such as social_security_num
or salary is relatively common.
Transactions are sets of related updates, eg removing money
from one account and adding it to another, or perhaps dropping one class and
adding another (sadly, remarkably many student DBMSs lack the latter
transaction operation). EN defines a transaction to be the result of an
executing program rather than a set of related updates; this latter
definition might include adding multiple new rows to one table.
Transaction processing must satisfy the isolation
property, that transactions appear to execute in isolation from one
another, and the atomicity
property, which says that each transaction is completely executed
or not executed at all. (This is sometimes described as that transactions
must meet the ACID test: Atomicity, Consistency, Isolation, Durability.
However, consistency and durability are relevant to all database updates;
they are not particular to transactions.)
Here is a simple failure of isolation: two transactions are transferring
money to other accounts.
- Transaction 1 is transferring $100 from A to B
- Transaction 2 is transferring $50 from A to C
The total amount of money should always remain $1000.
transaction 1
|
transaction 2
|
Acct A
|
Acct B
|
Acct C
|
|
|
1000
|
0
|
0
|
get value of A:
$1000
|
|
1000
|
0
|
0
|
add $100 to B
|
|
1000
|
100
|
0
|
|
Debit A by
$50
|
950
|
100
|
0
|
Store $1000 - $100 in A
|
|
900
|
100
|
0
|
|
Credit C by $50
|
900
|
100
|
50
|
Multi-user DBMSs need concurrency control.
Concurrency control leads to locks, the other great
performance bottleneck of RDBMSs.
Section 1.6 of EN talks about some of the advantages of having a central
DBMS rather than individual DBMSs maintained by each administrative group.
This is pretty much a settled issue now, though it does mean that the
"natural owners" of data in an organization (eg registration and records at
Loyola, for student registration data) will not in fact own that data.
Relational DBs (EN7 chapter 5 / EN6 chapter 3)
A relation is any set of tuples
The set of all possible tuples is the CROSS PRODUCT of some domains
col1 × col2 × col3 × ... × colN
Example: A = {1,2,3}, B = {x,y} C = {1,2}
A × B
A ×
C
< relation in A × C
<= relation in A × C
DB relations are not defined by rule, but by tabulation!
Given attribute sets A1,
A2, ..., An, a relation
is a subset of the cartesian product A1×A2×...×An;
that is, a set of tuples ⟨a1,a2,...,an⟩
where each ai∈Ai. These tuples may also be called records.
Relations in a DB are represented as tables.
EN also uses the term relation state
to refer to a specific set of records in a table.
STUDENT table, EN p 63
Name
|
SSn
|
Home_phone
|
Address
|
Office_phone
|
Age
|
GPA
|
Benjamine Bayer
|
305-61-2435
|
817-373-1616
|
2918 bluebonnet Lane
|
NULL
|
19
|
3.21
|
Chung-cha Kim
|
381-62-1245
|
817-375-4409
|
125 Kirby Road
|
NULL
|
18
|
2.89
|
Dick Davidson
|
422-11-2320
|
NULL
|
3452 Elgin Road
|
817-749-1253
|
25
|
3.53
|
Rohan Panchal
|
489-22-1100
|
817-376-9821
|
265 Lark Lane
|
817-749-6492
|
28
|
3.93
|
Barbara Benson
|
533-69-1238
|
817-839-8461
|
7384 Fontana Lane
|
NULL
|
19
|
3.25
|
Note the
Also note that some entries are NULL. This means undefined
or not available or not
known; unfortunately, these three options are not
synonymous or interchangeable. NULL values are essential, but they do
introduce some complications. The first is that records with NULL entries
are not in fact elements of A1×A2×...×An;
they are elements of
(A1 ∪ {NULL}) × (A2 ∪ {NULL}) × ...
× (An ∪ {NULL})
EN also gives an alternative definition of a relation, as a set of maps
from the attribute set to the set of attribute values, where the attribute
set is essentially the set of names
of columns. With this approach, a null entry is represented by a partial
map, undefined for some attributes.
Note that we must be careful when comparing null values: if two people have
NULL as their Office_phone, it does not
mean they have the same phone! Worse, we simply do not know if the NULL
means we don't know their phone, or if they simply do not have one, or if
they have no office at all and so the "office_phone" is irrelevant.
Joins
As we saw briefly last week, the join
is the operation of creating all records merged from two (or more) tables,
where one attribute of one table is required to match a corresponding
attribute of another. Usually, but not always, the column-matching is based
on equality of corresponding attributes.
Examples:
University:
- Listing all students in Section 112 (my data)
- Printing all of each student's grades, by joining the Student_number
fields of STUDENT and GRADE_REPORT
- Printing all sections including Course_name, joining COURSE and
SECTION on the Course_number field
The first example we did last week as follows:
select s.name from student s, grade_report
gr
where s.student_number =
gr.student_number and gr.section_identifier = 112;
Company:
- Printing the name and address of all employees who work in the
'Research' dept (Query1 on EN p 100; uses employee and department
tables)
- Printing the project number, dept number, and the dept manager's name,
for all projects located in 'Stafford' (Query 2 on EN p 100; uses
project and department tables)
- Printing each employee's name and his or her supervisor's name (Query
8, EN p 101; uses employee table joined to itself)
Demos of these
A full Cartesian product would be denoted in SQL by, eg,
select * from employee, department;
where there is no WHERE clause establishing a relation between the two
tables.
The join is conceptually somewhat inefficient. Lots of
behind-the-scenes optimization makes it fast.
More on keys
A KEY is any set of columns that is guaranteed to uniquely determine a row.
Primary Key: the key the database developer thinks is most important;
usually a single attribute if there is one
Composite Key: multiple columns (eg the GRADE_REPORT table). Note that there
is no single-column key here. Can also be the primary key.
Secondary Keys: other column combinations that are keys, but not the one
"intended".
Note that keys are not properties of particular tables, but rather of the
"table schema". They represent design constraints.
SQL examples
Some queries are from Ramakrishnan & Gehrke 2002 but are modified to be
appropriate for E&N's Office database; the others are from E&N
directly.
The E&N example numbering is peculiar because the book was radically
restructured with the 6th Edition, and examples kept their numbers from
previous editions.
Find all employees with salary >= 30000
select * from employee where salary >=
30000;
select e.fname, e.lname, e.salary from
employee e where e.salary >=30000;
Note my use of the e table alias. I
recommend this style for readabililty. You can think of e
as a variable that ranges over all the rows, though it looks syntactically
more like it represents a table.
Query 2 of E&N
For every project located in Stafford, list the project number, the
controlling department number, and the department manager's lname, address,
bdate.
Solution as written:
select pnumber, dnum, lname, address, bdate
from project, department, employee
where dnum = dnumber and mgr_ssn = ssn and plocation = 'Stafford';
Note that this is a dual-join
example.
Compare with the version with names:
select p.pnumber, dnum, e.lname, e.address,
e.bdate
from project p, department d, employee e
where p.dnum = d.dnumber and d.mgr_ssn = e.ssn and p.plocation =
'Stafford';
Find the list of supervisor ssns
(that is, omit duplicates)
select distinct
e.super_ssn from employee e;
Looking at the output, it might be good to add "where e.super_ssn is not
null"
Find all employees who have worked on
project 2
(Note that in our works_on table, no employee works on the same project more
than once. Compare the solution to obtaining a list of employees who have
worked on project 2, where the table is works_on_by_week: ⟨essn, pno, week, hours⟩
select e.fname, e.lname from employee e,
works_on w where e.ssn = w.essn and w.pno = 2;
Note this is a join example (not
our first). We could also have written it:
select e.fname, e.lname from employee e join works_on w on
e.ssn = w.essn where w.pno = 2;
Find all employees who have worked >20
hours on a project
select e.fname, e.lname, w.pno, w.hours from
employee e, works_on w
where e.ssn = w.essn and w.hours >20;
A tricky foreign-key constraint
Let's try to create a foreign key constraint to require that any
project.plocation must appear as a dept_locations.dlocation:
alter table project ADD constraint
FK_project_dept_locations foreign key (plocation) references
dept_locations(dlocation);
alter table project drop foreign key
FK_project_dept_locations;
-- to undo
But this fails, because dlocation is not a key of
dept_locations (actually, this is a slight oversimplification; see below).
Foreign-key constraints are already a performance hit; non-key lookups are
in principle quite a bit less efficient. I get:
MySQL: ERROR 1215 (HY000): Cannot add
foreign key constraint [not very helpful!]
Postgres: ERROR: there is no unique constraint matching given keys
for referenced table "dept_locations"
What the Postgres error message means is that dept_locations.dlocation --
the "referenced table" -- is not a key. But the real issue is that
table dept_locations does not have an index on the
dlocation column. The important thing is for the RDBMS to be able to verify
a given FK constraint quickly, by using an index on the parent table. If the
FK reference is to the parent table's primary key, then the index is
automatic as MySQL creates an index on the primary key for every table.
More at http://dev.mysql.com/doc/refman/5.0/en/create-table-foreign-keys.html
To create an index (simplified version), use
create index indexname on tablename(indexcolumn);
create index dloc_index on dept_locations(dlocation);
drop index dloc_index on dept_locations;
At this point, the above "alter table ..." command works under MySQL, though
it still does not work under Postgres. Here is the Postgres
rule:
A foreign key must reference columns that
either are a primary key or form a unique constraint. This means that the
referenced columns always have an index (the one underlying the primary
key or unique constraint); so checks on whether a referencing row has a
match will be efficient.
That is, the actual requirement is that the referenced attribute (or set of
attributes) be a key, though the reason given for this is so that
there is an index (which we could in principle add separately).
There is also one other workaround:
create table legal_locations (
location varchar(15)
primary key
);
insert into legal_locations (select distinct dlocation from
dept_locations);
-- no "values"
alter table project ADD foreign key (plocation) references
legal_locations(location);
alter table dept_locations ADD foreign key (dlocation) references
legal_locations(location);
To remove these:
show create table project;
alter table project DROP foreign key project_ibfk_2;
show create table dept_locations;
alter table dept_locations DROP foreign key dept_locations_ibfk_1;
E&N Query 4: List all project numbers
of projects involving employee 'Smith', either as manager or worker
Note that projects are managed by the manager of the project's controlling
department. One approach is to use the union keyword:
(select distinct p.pnumber
from project p, department d, employee e
where d.dnumber = p.dnum and d.mgr_ssn = e.ssn and e.lname = 'Smith')
union
(select distinct p.pnumber
from project p, works_on w, employee e
where p.pnumber = w.pno and w.essn = e.ssn and e.lname = 'Smith');
Smith, however, hasn't managed any projects. It works better for Wong:
set @NAME = 'Wong';
-- MySQL notation
(select distinct p.pnumber
from project p, department d, employee e
where d.dnumber = p.dnum and d.mgr_ssn = e.ssn and e.lname = @NAME)
union
(select distinct p.pnumber
from project p, works_on w, employee e
where p.pnumber = w.pno and w.essn = e.ssn and e.lname = @NAME);
Here is a version using a conventional join. Sort of:
select distinct p.pnumber
from project p, department d, employee e, works_on w
WHERE e.lname = @NAME AND (
(d.dnumber = p.dnum and d.mgr_ssn = e.ssn)
OR
(p.pnumber = w.pno and w.essn = e.ssn))
There is in fact something very unconventional about this join: we
can't decide what tables are being joined! The first of the OR clauses in
the where section joins tables project, department
and employee; the second OR clause joins tables project, works_on
and employee. In fact, we cannot do this as a three-table join using the
alternative table1 join table2 on...
syntax.
E&N
Query 12, Find all employees with address in Houston, TX
To do this, we have to use substrings and the LIKE operator:
select e.fname, e.lname from employee e
where e.address LIKE '%Houston TX%';
Note the uncertainty; if someone put two spaces in, we're in trouble.
E&N did use commas in
addresses; I did not.
Query 12A: find all employees born in the
1950's.
E&N way: too clever by half:
select e.fname, e.lname from employee e
where e.bdate LIKE '195_-__-__';
For one thing, Oracle has a different standard date format than MySQL. (I
added dashes to my pattern, hopefully for clarity.)
Better:
select e.fname, e.lname from employee e
where e.bdate between '1950-01-01' and '1959-12-31';
Note that dates are entered as strings, but they are indeed parsed by MySQL
and are stored numerically.
E&N Query 14: Find employees with salary between $30,000 and $40,000
select e.fname, e.lname from employee e
where e.salary between 30000 and 40000;
E&N Query 15: Introduces order by
(also a triple-join)
Give a list of employees and the projects
they are working on, ordered by department, and, within departments,
ordered alphabetically by lname, fname.
select d.dname, e.lname, e.fname, p.pname
from department d, employee e, works_on w, project p
where d.dnumber = e.dno and e.ssn = w.essn and w.pno = p.pnumber
order by d.dname, e.lname, e.fname;
Names of employees who have worked on
project 2 or project 3:
select e.fname, e.lname , w.pno from
employee e, works_on w
where e.ssn = w.essn and (w.pno = 2 or
w.pno =3);
Alternatively:
(select e.fname, e.lname, w.pno from
employee e, works_on w where e.ssn = w.essn and w.pno = 2)
union
(select e.fname, e.lname, w.pno from employee e, works_on w where e.ssn =
w.essn and w.pno = 3);
Names of employees who have worked on project 2 and
project 3. This one is harder! If we change the or
above to and, we get nobody.
select e.fname, e.lname, w.pno from employee e, works_on w where e.ssn =
w.essn and w.pno = 2;
+----------+---------+-----+
| fname |
lname | pno |
+----------+---------+-----+
| John |
Smith | 2 |
| Franklin | Wong
| 2 |
| Joyce | English
| 2 |
+----------+---------+-----+
select e.fname, e.lname, w.pno from employee e, works_on w where e.ssn =
w.essn and w.pno = 3;
+----------+---------+-----+
| fname |
lname | pno |
+----------+---------+-----+
| Franklin | Wong
| 3 |
| Ramesh | Narayan
| 3 |
+----------+---------+-----+
From looking at these, it is clear that Franklin Wong has worked on both.
Attempt 1:
(select e.fname, e.lname, w.pno from
employee e, works_on w where e.ssn = w.essn and w.pno = 2)
intersect
(select e.fname, e.lname, w.pno from employee e, works_on w where e.ssn =
w.essn and w.pno = 3);
Alas, MySQL 5.6 does not support the INTERSECT operator. (Also, we should
get rid of the w.pno column in the output! Here is the revised query, which
does run under postgres (below)):
(select e.fname, e.lname from employee e,
works_on w where e.ssn = w.essn and w.pno = 2)
intersect
(select e.fname, e.lname from employee e, works_on w where e.ssn = w.essn
and w.pno = 3);
Attempt 2:
select e.fname, e.lname from employee e,
works_on w1, works_on w2
where e.ssn=w1.essn and w1.pno = 2 and e.ssn=w2.essn and w2.pno = 3;
This works.
As a related example, suppose we want names of employees who have worked on
more than one project:
select e.fname, e.lname from employee e,
works_on w1, works_on w2
where e.ssn = w1.essn and e.ssn = w2.essn and w1.pno <> w2.pno;
This could use a DISTINCT.
It could also be done by creating a COUNT of the number of projects worked
on by each employee, and then printing those employees for which COUNT >=
2.
What joins do
Consider
our
earlier join example:
select e.fname, e.lname, d.dname
from employee e, department d where e.dno = d.dnumber;
As
mentioned
last week, one can think of e and d as "cursors", or row variables,
representing rows in the respective tables employee and
department. Linear search through the employee table , as in
the SQL example
select e.lname, e.salary from
employee e where e.salary >= 30000;
might
be
represented by the following java-like loop, showing a linear search through
the
table:
for (e
in employee) {
if (e.salary >= 30000) {
print(e.lname, e.salary)
}
}
Now let's write this for the
above join; here we get a nested loop and quadratic runtime
(specifically, O(nm), where n=employee.size() and
m=department.size())
for (e
in employee) {
for (d in department) {
if (e.dno == d.dnumber) {
print(e.fname, e.lname, d.dname)
}
}
Clearly, for triple joins over large databases, performance is an issue.
This is addressed two ways: query optimization, to
eliminate "obvious" inefficiencies, and internal indexes.
The primary key defines a candidate for indexing, though not all indexing
is on key fields.
Without a join, all you can look at is one row at a time of one
table.
Self-join
Without a join, all you can look at is one row at a time of one
table. When we join EMPLOYEE and DEPARTMENT, eg
select e.lname, d.dname from employee e,
department d where e.dno = d.dnumber;
we are looking at one row at a time of each table, but two tables. The
DEPARTMENT table is being used here to "extend" the EMPLOYEE table: each
employee record now has additional attributes from DEPARTMENT.
Suppose we want to print a list of all employees and their supervisors.
Here's the approach we might take:
select e.fname, e.lname, s.fname, s.lname
from employee e, employee s
where e.super_ssn = s.ssn;
The idea here, in terms of cursors, is for e to traverse linearly the
employee table. For each employee, we get e.super_ssn, and then use s to
traverse the same table, looking for a row with s.ssn = e.super_ssn.
Each record in the result is a combination of two rows
of table EMPLOYEE, so a join is necessary.
In terms of table extension, the EMPLOYEE table is being "extended" to
include additional attributes about supervisors besides just super_ssn.
This particular example is much more readable if we relabel the supervisor
columns in the output
select e.fname, e.lname, s.fname AS
super_fname, s.lname AS super_lname
from employee e, employee s
where e.super_ssn = s.ssn;
Note that the AS here is not optional, and plays a rather different role
than the AS that can be used in the FROM section (from employee AS e,
employee AS s).
WHERE conditions
In many cases, the where
conditions are either join conditions or are simple Boolean expressions
about a single row. Sometimes, however, the question being asked relates
to other rows, as in "List employees who worked on more than one project";
if we form the join table of employees and their projects we still cannot
answer this row-by-row. Another query might be to list employees who are
part of some set of employees whose salaries add up to exactly 100,000.
The answer last week to the employees-who-worked-on-multiple-projects
query was a "triple" join with inequality as one of the join conditions:
select e.fname, e.lname from employee e,
works_on w1, works_on w2
where e.ssn = w1.essn and e.ssn = w2.essn and w1.pno <> w2.pno;
Most joins (but not this one!) are equijoins; that is,
joins involving an equality relationship. Inequalities can be harder to
understand. Recall that the above version introduces many duplicates, one
for each way of choosing two unequal projects. "Select distinct" is
better.
Actually, this particular example is easier done using a subquery
and the count operator (which we'll cover below):
select e.fname, e.lname from employee e
where (select count(w.pno) from works_on w where w.essn = e.ssn) > 1
In general, WHERE conditions that just select rows of a table satisfying
a given Boolean condition are fundamentally trivial. Joins in WHERE
conditions are a little less trivial, and WHERE conditions that select a
record (or part of a record) depending on other records in the table tend
to be the hardest. Example: list all students who got straight A's; this
lists the student_id values from grade_report where all of the matching
rows have an A in the grade column. This last category often involves
grouping, or subqueries, or some other advanced feature.
Why is SQL hard?
Basically, it's hard because it is a non-imperative language:
you don't issue a sequence of commands. Instead, you write a single
big expression. It's hard to visualize big expressions, and hard to
test them incrementally. Nested queries do seem to help in this regard.
(Actually, the data types of SQL are tables, and you can in
fact save a table in a "table variable". But this is rare.)
Perhaps because of this non-imperative nature of the language, it is
relatively easy to read several SQL examples and still have trouble
knowing where to start when trying to write one from scratch.
Examples from above revisited
Find all employees who have worked on project 2
This one involves a single join between employees (to get the names) and the
works_on table. From the works_on table we can easily get the SSNs of the
employees who worked on project 2; we need to join with table employee to
convert SSNs to names.
If we change the query to
Find all employees who have worked on a
project in Stafford
we can do the following. Using the project table, we can get the project
number from the location. Using the works_on table, we can find the
employees (by essn) who have worked on that project. And using the employee
table, we can convert to employee names:
select e.fname, e.lname
from project p, works_on w, employee e
where p.plocation = 'stafford' and p.pnumber = w.pno and w.essn = e.ssn;
Note that only data from table employee is actually part of the select statement.
Query 2: For every project located in Stafford, list the
project number, the controlling department number, and the department
manager's lname, address, bdate.
From the project table we can get the project location, number and
department:
select p.pnumber, p.dnum
from project p where p.plocation='stafford';
To get the department-manager information (all from one table) we
needed two joins: first a join between the project and department
tables (on dnum/dnumber) to get the department mgr_ssn, and then between
department and employee (on mgr_ssn and ssn).
select p.pnumber, p.dnum, e.lname,
e.address, e.bdate
from project p, department d, employee e
where p.dnum = d.dnumber and d.mgr_ssn
= e.ssn and p.plocation = 'Stafford';
Note that no department fields are included in the query results. The
works_on table is not involved.
Query 4: Here we combine both the above: we want all
the projects in which Wong has participated, either as employee or as
manager. For the employee case, we have a join between works_on and
employee. For the manager, we have a join between projects and departments,
to get the dept manager of the controlling department, and then with the
employee table.
We did the same thing with the self-join example above: given an employee e,
we need to look up e.mgr_ssn in the employee table.
Joins again
Whenever you list multiple tables in the from line,
you have a join. Without a join condition between two tables, you will have
N×M records in the result; this is seldom what you want. With more than two
tables, you don't need a join between every pair; you just need a "chain" of
joins uniting everything. In general, for N tables you need N−1 join
conditions.
Join conditions in the where clause serve to join
different tables; they can be identified by the format table1.attribute1 =
table2.attribute2 (sometimes other relational operations than = are
involved, but those ar rare). Every other condition in the where clause has
to do with selecting a subset of the total number of rows, thus discarding
some rows from consideration. It is possible to see join conditions this
way, too, discarding rows from the cross product of the two
tables. But this is not terribly helpful in practice.
All this is one reason some prefer the newer syntax "from table1 join table2 on join-condition".
In all the examples presented above, one of the join attributes
is the primary key of the table it appears in. Here are most of those
examples again, where for each join's equality condition the attribute
that is the primary key of its table is in bold.
select e.lname, d.dname from employee e,
department d
where e.dno = d.dnumber;
select distinct p.pnumber
from project p, department d, employee e
where d.dnumber = p.dnum and d.mgr_ssn = e.ssn and
e.lname = 'Smith'
select distinct p.pnumber
from project p, works_on w, employee e
where p.pnumber = w.pno and w.essn = e.ssn and
e.lname = 'Smith'
select d.dname, e.lname, e.fname, p.pname
from department d, employee e, works_on w, project p
where d.dnumber = e.dno and e.ssn =
w.essn and w.pno = p.pnumber
select e.fname, e.lname , w.pno from employee e, works_on w
where e.ssn = w.essn and (w.pno = 2 or w.pno
=3);
select e.fname, e.lname
from project p, works_on w, employee e
where p.plocation = 'stafford' and p.pnumber =
w.pno and w.essn = e.ssn;
When this is done with two tables, the other table (with join
attribute that is not a key) can be thought of as being extended.
For each record in the other table, we use the join attribute to look up a
single record from the first table (using the primary key).
Note that in most cases only one join-condition
attribute is a primary key; the other though is a foreign key.
Are there any cases where neither of the join's attributes is a key of its
relation? Mathematically this can certainly happen; in fact, a join
condition does not even have to be based on an equality comparison. But
coming up with a meaningful example is a different
thing. Here's the best I could do. Suppose we want a list of all employees
who live in the same town as some department's office. We might want
such a list, for example, for emergency preparedness. We can match the
employee city field with the dept_locations dlocation field; neither of
these is a key. Actually, in the Company database we have to use the
employee address field, and do substring matching, as follows:
select distinct e.ssn, e.lname, e.address from
employee e, dept_locations dl
where e.address LIKE concat('%', dl.dlocation, '%');
I used distinct because an employee in Houston
would otherwise be listed for both departments 1 and 5 (both of which have
offices in Houston).