Intro to SQL



Here is a quick SQL summary sheet

The examples below mostly come from EN7 chapter 6 / EN6 chapter 4.

Note that the numbering of the example queries is the same in EN7 and EN6.




With SQL we can

We will use these two databases

Example of create table:

create table employee2 (                                            -- fewer columns than table employee

      fname    varchar(15) not null,

      lname    varchar(15) not null,

      ssn         char(9)         not null,

      salary   decimal(10,2),

      super_ssn char(9),

      dno      int               not null,

      primary key (ssn),

      foreign key (super_ssn) references employee(ssn),

      foreign key (dno) references department(dnumber)

);

Note the six attributes (fname, lname, ssn, salary, super_ssn and dno), and their types. Four of the attributes have a constraint: not null. This means these fields cannot be left empty. Finally, we have the primary key and two foreign keys; these are also forms of constraint (the second foreign-key constraint refers to another table).

Examples of insert, update and delete; more below:

insert into employee2 values ('peter', 'dordal', '123456789', 29000.01, '012345678', 55);

delete from employee2 where fname='peter';

update employee2 set salary = 1.10 * salary where salary >= 50000;


The SQL select-from-where statement: EN7 §6.3 / EN6 §4.3

    select columns from table/join where boolean condition selecting rows

The tricky part is that queries can involve joins. We will prefer the explicit-join notation in which the join condition is moved to the from clause, but this still leaves one with the challenge of figuring out exactly what tables need to be joined, and on what attributes.

Note that SQL is a "nonimperative" language: it has (essentially) no assignment operator. (Ok, you can save tables as intermediate results, but you should not do that. For now.)

Scenarios where we use the join:

1. The "extension" case: table2 in some sense extends information that could have been put in table1 except for redundancies. For each row in table1, find the unique matching row in table2. The join column in this case is likely to be a key in table2, and declared as a foreign key in table1. Example:

select e.fname, e.lname, d.dname from employee e, department d where e.dno = d.dnumber;

2. The "relationship" case: table2 defines some relationship; it has a dual-column key and the join column is a key in table1 and one of the key columns in table2. Example:
   
select e.fname, e.lname, w.pno from employee e, works_on w where e.ssn = w.essn;

Actually, this case is exactly the same as the first, with table1 and table2 reversed. Except that we tend to think of the employee table as representing things (people), and the works_on table as representing relationships (who works on what).

We can also have multiple matches in table2, or zero matches. Multiple matches occur in the second example (technically, multiple rows in table1 in the first example match the row in table2 with dnumber=5).

We can  create a new table from an old one (or ones):

create table emp_names as select fname,lname,ssn from employee;
select * from emp_names;
drop table emp_names;

Primary Keys

The primary key is a type of constraint: two records cannot be inserted in the database with the same values for the key.

A table can have several keys; the "primary" key is the most natural one and is usually the one by which the table is indexed. If a table represents an entity, or thing, then the primary-key value is how we will identify the thing; for example, an employee above can be identified by their SSN.

Foreign Keys

Key constraints are one kind of constraint. What about the use of dno in table Employees? It should be clear that we want all dnos to refer to real departments, that is, to match an existing dnumber in table Department. This is done through a foreign key constraint: we declare in table Employee that attribute dno is a foreign key: a reference to a key of another table. The declaration looks like
foreign key (dno) references department(dnumber)

We can also give this constraint a name:
        constraint FK_department_employee foreign key (dno) references department(dnumber)

Note that the constraint here applies to adding (or updating) records in Employee, and also to deleting records in Department.

Foreign keys are notorious for introducing circularity problems. What happens if we enforce foreign keys and try to load the database as originally written? With all tables empty, we can't add any employee because no dno value we might use would appear in the empty Department table, and we cannot add a department because the mgr_ssn is a foreign key referencing Employee.

In principle, there is no reason to require that the foreign key actually be a key in the other table. In practice, it almost always is; in database schemas generated through so-called Entity-Relationship diagrams it always is.

Life can be quite frustrating if you forget the circularity problem. Once two tables with a "foreign-key embrace" (each uses the other as a foreign key) are created, they can be difficult to remove. Sometimes one has to resort to dropping the entire database. If I load my file company.brokenalter.text, these all fail:

One thing I can do is

It turns out that the name MySQL assigned to my foreign-key constraint from department.mgr_ssn to employee.ssn is "department_ibfk_1".  Some people like to assign their own names to foreign-key constraints for this reason.

To drop table T, you must first drop all foreign key constraints from other tables to T. After the constraint drop above, we can drop table employee. The MySQL constraint name can be determined from show create table department. Postscript shows all constraints with \d department.

Another MySQL-specific thing is this:



A Look At Constraints

EN7 p 157 / EN6 p 67

Databases involve several kinds of constraints

0. Implicit constraints enforced by the table structure. For example, an employee can be in only one department and have only one supervisor and only one address, because there is a single column for each of these attributes in the employee table.

1. Type constraints on column values

insert into section values ("hello", "foo", "Fall", 2008, "Wiggums");

2. Key constraints: in each table, any given declared key can occur in only one row. This is not a property of a table at a particular moment, but rather a rule that says that a second record with a duplicate key can never be added.

insert into section values (85, "foo", "Fall", 2008, "Wiggums");

3. Foreign-key constraints (also called referential-integrity constraints): Consider again the university database on p 8 of EN. In the SECTION table, the key is Section_identifier. A typical constraint would be that we are not allowed to have a record that has a Course_number value that is not found in the COURSE table. This appears in the create table declaration as follows:

create table section (
    section_identifier  int           primary key not null,
    course_number       varchar(12)   not null,
    semester            varchar(10)   not null,
    year                int,
    instructor          varchar(20),
    foreign key (course_number) references course(course_number)
);

The last line above means that in order to create a record in table section, the course_number attribute must match a pre-existing course_number attribute of a row in table course. The following should fail for the existing DB, as there is no course_number with value "foo":

insert into section values (12345, "foo", "Fall", 2008, "Wiggums");
delete from section where instructor = 'Wiggums'

Not all versions of the file for creating the University DB have all the above foreign key constraint in place. One way to check this in Postgres is with \d section, or in MySQL with describe section; or with show create table section;.

Similarly, in the GRADE_REPORT table, the primary key is ⟨Student_number, Section_identifier⟩. Each of these attributes has a foreign-key constraint:
For a long time in MySQL, foreign-key constraints were not actually enforced. This made certain DB "engines" faster, and in many cases foreign-key constraints are easily implemented in the application logic. However, Oracle eventually introduced the InnoDB engine, which is now the default, and this engine does enforce foreign-key constraints. The Oracle and Postgres DBs have always enforced foreign-key constraints.

It is common practice to give names to foreign-key constraints; this helps identify the source of constraint-related errors; it's also easier to drop and add constraints if they have sensible names. I gave names to the constraints in the university file, using the naming convention

    FK_thisTable_otherTable_otherAttribute

Several naming conventions exist, and there are two problems with the strategy above:

Constraint names are included in SQL like this:

constraint constraint_name foreign key course_number references course(course_number)


Here's a list of the constraints:


Implicit constraints may seem relatively "weak", but they are at the heart of database consistency. For example, suppose we add an address column to the student table in the University database. As long as one student can be represented by only one row in this table (eg because the key for the table is Student_number), we cannot have the same student with two addresses.

Now suppose instead we redesigned the database to include the student name and address directly into the GRADE_REPORT table, along with the Student_number. This is slightly wasteful of space, but that is a minor concern. The more serious problem is that this now allows inconsistency: we can have student 17 have two different addresses in two different records (for two different section_numbers, or even two different names.

This is the data-consistency problem that the relational model was so successful at solving. If tables are designed appropriately, the potential for duplicate entries is simply eliminated.

Here's another view of inconsistency. Suppose we have a table with records like the following::

    Purchase, CustomerName, CustomerAddr
   
This allows us to have two records for two different purchases, one with

        purchase1, Peter, LakeShore

and one with

        purchase2, Peter, WaterTower

Oops! Peter is now getting duplicate mailings (at least if the wrong one is forwarded).
   
To prevent inconsistency, relationships are "factored" (more on this later) into multiple tables so as to prevent this. In this case, we would want a table of CustomerName and CustomerAddr (perhaps also with CustomerID), and a second table with columns Purchase and CustomerID. Now it is not possible to have one customer with two addresses.



There are four kinds of constraints supported directly by SQL:

Foreign key constraints: what happens if we insert the dname,dnumber for a department, then add employees, then do one of the following:
Here is the example from EN7 p 185 / EN6 p 95 (Figure 6.2/4.2), but with differently named constraints

create table department(
    dname varchar(20),
    dnumber int,
    mgr_ssn char(9)  not null default = '888665555',
    mgr_start date,
    constraint dept_primary_key primary key (dnumber),
    constraint dept_secondary_key unique (dname),
    constraint dept_mgr_foreign_key
       foreign key (mgr_ssn) references employee(ssn)
       on delete set default
       on update cascade
);  

Named constraints mean that you will be told what constraint is violated. This is less helpful than it seems. Named constraints also means that constraints can be deleted by name.

But consider the on_delete / on_update clauses. What these mean is that if you delete an employee e from the employee db who is a dept manager (ie that employee's ssn is used as department.mgr_ssn for some row), then the mgr_ssn is set to the default value of 888665555. Furthermore, if you update an employee entry to correct a department manager's ssn, then that corrected value is cascaded into the appropriate row(s) of the department table.



Insert, Delete and Update

Examples:

insert into department values ('Sales', 6, '888665555', '2012-01-19');
update department set mgr_ssn = '333445555' where dnumber = 6;
delete from department where dnumber = 6;

See operations in section EN7 5.3 / EN6 3.3:

Inserts: these must not violate key constraints or fkey constraints

If we are inserting a new employee, we can't reuse an existing ssn and we can't have dno be a nonexistent department

Deletes: can never violate key constraints, only fkey constraints in another table.

We can't delete department 4 in the Department table until all employees of department 4 have been removed from the Employee table

Updates:
    of key value; can violate key constraints, or fkey constraints in another table (eg changing dnumber)
    of fkey value: can violate fkey constraints
    non-key/fkey operations are safe

Demos

(There are more demos below in the Chapter 6/4 material)

insert into employee values ('ralph', 'j', 'wiggums', '123456798', NULL, NULL, 'M', 9999, '333445555', 7);
delete from employee where ssn = '123456798';

update employee set dno = 107 where ssn = '333445555';     // originally dno = 5 here
update employee set dno = 5 where ssn = '333445555';



Insert

insert into tablename values (f1val, f2val, ... , fnval);
You can also name columns:
    insert into employee(fname, lname, ssn, dno) values ('john', 'smith', 345678912, 4);

Finally, instead of values you can provide a select query that returns a set of rows from some existing table, with matching "type signature" (matching column types).

Inserts can violate all four constraints.
insert into employee(fname,lname,ssn,dno)
values ('robert', 'Hatcher', '456789123', 2);

insert into employee(fname,lname,dno)
values ('robert', 'Hatcher', 5);

Update
Basic form:
    update employee
    set salary = salary*1.10
    where dno = 5;

Updates can also violate all four constraints, though the not null constraint can be violated only by setting a column to null, and the primary key constraint can only be violated by updating the primary key. Foreign key constraints are relatively easy to violate, though.

Delete
delete from tablename where boolean condition selecting rows

Here, we can only violate foreign key constraints, eg by deleting from table department the department of some existing worker.


SQL data types

The DATE format is yyyy-mm-dd, eg 2013-01-27. This is actually standardized by ISO 8601. The United States usage, mm-dd-yyyy, was apparently used in England as well until the 20th century, when England switched to dd-mm-yyyy for greater consistency with the rest of Europe. See http://www.antimoon.com/forum/t1952.htm.

Dates in MySQL are entered as if they were strings, but they are most certainly not stored that way. Try inserting a record with date '07-04-1980'; MySQL will give up and set the date (silently!) to zero.



Table Joins

Suppose in the university database we want to know the names of everyone in section 112. (A peculiarity of the specific data given as example is that no section has more than one student!) The GRADE_REPORT table has only student numbers; we need to match these up with names from the STUDENT table. This operation, of matching corresponding rows of different tables, is known as the join. Here is the SQL for the query, where the join condition is in bold:

select s.name
from student s join grade_report gr on s.student_number = gr.student_number
where gr.section_identifier = 112;

We are retrieving records from two tables here, but restricting attention to pairs of records that "match up" according to the join condition.

The join can also be done with the following older, implicit-join syntax:

select s.name from student s, grade_report gr
where s.student_number = gr.student_number and gr.section_identifier = 112;

Finally, maybe we want the name and the grade:

select s.name, gr.grade from student s join grade_report gr
on s.student_number = gr.student_number
where gr.section_identifier = 112;

The join operation was once derided as introducing too much inefficiency. Technical advances in the 1980's made this issue less important, but the rise of huge datasets in this century has made this again relevant.


SQL examples

Some queries here are from Ramakrishnan & Gehrke 2002 but are modified to be appropriate for E&N's Company database; the others are from E&N directly.

The E&N example numbering is peculiar because the book was radically restructured with the 6th Edition, and examples kept their numbers from previous editions.

Find all employees with salary >= 30000

select * from employee where salary >= 30000;

select e.fname, e.lname, e.salary from employee e where e.salary >=30000;

Note the use of the e table alias. I recommend this style for readabililty. You can think of e as a variable that ranges over all the rows, though it looks syntactically more like it represents a table.

Query 2 of E&N (EN7 p189 / EN6 p100)

For every project located in Stafford, list the project number, the controlling department number, and the department manager's lname, address, bdate.

Solution as written:

select pnumber, dnum, lname, address, bdate
from project, department, employee
where dnum = dnumber and mgr_ssn = ssn and plocation = 'Stafford';

Note that this is a dual-join example.

Compare with the version with table-alias variables and explicit joins:

select p.pnumber, p.dnum, e.lname, e.address, e.bdate
from project p join department d on p.dnum = d.dnumber join employee e on d.mgr_ssn = e.ssn
where p.plocation = 'Stafford';

Find the list of supervisor ssns (that is, omit duplicates)

select distinct e.super_ssn from employee e;

Looking at the output, it might be good to add "where e.super_ssn is not null"


Find all employees who have worked on project 2

(Note that in our works_on table, no employee works on the same project more than once. Compare the solution to obtaining a list of employees who have worked on project 2, where the table is works_on_by_week: ⟨essn, pno, week, hours⟩

select e.fname, e.lname from employee e join works_on w on e.ssn = w.essn where w.pno = 2;

Note this is a join example (not our first). We could also have written it:

select e.fname, e.lname from employee e, works_on w where e.ssn = w.essn and w.pno = 2;



Find all employees who have worked >20 hours on a project

select e.fname, e.lname, w.pno, w.hours from employee e join works_on w on e.ssn = w.essn
where w.hours >20;



EN Query 4: List all project numbers of projects involving employee 'Smith', either as manager or worker

Note that projects are managed by the manager of the project's controlling department. One approach is to use the union keyword:

(select distinct p.pnumber
from project p join department d on p.dnum=d.dnumber join employee e on d.mgr_ssn = e.ssn
where e.lname = 'Smith')
union
(select distinct p.pnumber
from project p join works_on w on p.pnumber = w.pno join employee e on w.essn = e.ssn
where e.lname = 'Smith');

Smith, however, hasn't managed any projects. It works better for Wong:

set @name = 'Wong';              -- MySQL notation; then use @NAME
\set name '\'Wong\''                 -- Postgres notation; then use :name. There is no terminating semicolon!

Postgres:
(select distinct p.pnumber               
from project p join department d on p.dnum=d.dnumber join employee e on d.mgr_ssn = e.ssn
where e.lname = :name)
union
(select distinct p.pnumber
from project p join works_on w on p.pnumber = w.pno join employee e on w.essn = e.ssn
where e.lname = :name);

MySQL:
(select distinct p.pnumber
from project p join department d on p.dnum=d.dnumber join employee e on d.mgr_ssn = e.ssn
where e.lname = @NAME)
union
(select distinct p.pnumber
from project p join works_on w on p.pnumber = w.pno join employee e on w.essn = e.ssn
where e.lname = @NAME);

Here is a version using a conventional implicit join. Sort of.

select distinct p.pnumber
from project p, department d, employee e, works_on w
WHERE e.lname = @NAME AND (
    (d.dnumber = p.dnum and d.mgr_ssn = e.ssn)
    OR
    (p.pnumber = w.pno and w.essn = e.ssn))

There is in fact something very unconventional about this join: we can't decide what tables are being joined! The first of the OR clauses in the where section joins tables project, department and employee; the second OR clause joins tables project, works_on and employee. In fact, we cannot do this using the explicit-join notation.

EN Query 12, Find all employees with address in Houston, TX

To do this, we have to use substrings and the LIKE operator:

select e.fname, e.lname from employee e
where e.address LIKE '%Houston TX%';

Note the uncertainty; if someone put two spaces in, we're in trouble. E&N did use commas in addresses; I did not.

Query 12A: find all employees born in the 1950's.

EN way: too clever by half:

select e.fname, e.lname from employee e where e.bdate LIKE '195_-__-__';

For one thing, Oracle has a different standard date format than MySQL. (I added dashes to my pattern, hopefully for clarity.)

Better:

select e.fname, e.lname from employee e
where e.bdate between '1950-01-01' and '1959-12-31';

Note that dates are entered as strings, but they are indeed parsed by MySQL and Postgres, and are stored numerically.

EN Query 14: Find employees with salary between $30,000 and $40,000


select e.fname, e.lname from employee e
where e.salary between 30000 and 40000;

EN Query 15: Introduces order by (also a triple-join)

Give a list of employees and the projects they are working on, ordered by department, and, within departments, ordered alphabetically by lname, fname.

select d.dname, e.lname, e.fname, p.pname
from department d join employee e on d.dnumber = e.dno join works_on w on e.ssn = w.essn join project p on w.pno = p.pnumber
order by d.dname, e.lname, e.fname;

select d.dname, e.lname, e.fname, p.pname
from department d, employee e, works_on w, project p
where d.dnumber = e.dno and e.ssn = w.essn and w.pno = p.pnumber
order by d.dname, e.lname, e.fname;




Foreign Keys

Key constraints are one kind of constraint. What about the use of dno in table Employees? It should be clear that we probably want all dnos to refer to real departments, that is, to match an existing dnumber in table Department. This is done through a foreign key constraint: we declare in table Employee that attribute dno is a foreign key: a reference to a key of another table. The declaration looks like

foreign key (dno) references department(dnumber)

We can also give this constraint a name:

        constraint FK_employee_department foreign key (dno) references department(dnumber)

(This is a simpler naming convention from the earlier example; only the parent table name is given.) Note that the constraint here applies to adding (or updating) records in Employee, and also to deleting records in Department.

Foreign keys are notorious for introducing circularity problems. What happens if we enforce foreign keys and try to load the COMPANY database as originally written? With all tables empty, we can't add any employee because no dno value we might use would appear in the empty Department table, and we cannot add a department because the mgr_ssn is a foreign key referencing Employee.

In principle, there is no reason to require that the foreign key actually be a key in the other table. In practice, it almost always is; in database schemas generated through so-called Entity-Relationship diagrams it always is.

Life can be quite frustrating if you forget the circularity problem. Once two tables with a "foreign-key embrace" (each uses the other as a foreign key) are created, they can be difficult to remove. Sometimes one has to resort to dropping the entire database. If I load my file company.brokenalter.text, these all fail:

The last one above, however, fails simply because it is wrong; I shouldn't have used the column name (dno), but rather the constraint name (in this case, department_ibfk_1). Some people like foreign-key constraint names for this reason.

To drop table T, you must first drop all foreign key constraints from other tables to T.

The command

does work. The constraint name can be determined from show create table department.

Another thing that does work (though only for MySQL) is this:

Here is an insert command that should fail due to a foreign-key violation, as there is no department 6 (the delete command right after undoes the addition):

insert into employee values ('ralph', null, 'wiggums', '121212121', null, null, null, null, null, 6);
delete from employee where lname = 'wiggums';

If this succeeds, the employee table probably has foreign key constraints removed. You can see the constraints with the MySQL command

show create table employee;

They can be added back with:

alter table employee ADD foreign key (super_ssn) references employee(ssn);
alter table employee ADD foreign key (dno) references department(dnumber);

They can be added back and given names with the following:

alter table employee ADD constraint FK_employee_employee foreign key (super_ssn) references employee(ssn);
alter table employee ADD constraint FK_employee_department foreign key (dno) references department(dnumber);

(The naming convention here is FK_childtable_parenttable. It is common, but not universal; some add the referenced column in the parent table as well.)


The FK declaration goes into the child table, and includes a reference to a parent table: some column of the child table is restricted to values that appear in the designated column of the parent table. That is, with the second FK constraint above, involving dno, table EMPLOYEE is the child table and table DEPARTMENT is the parent table. (Of course, there is a different FK constraint, on DEPARTMENT.mgr_ssn, making department the child and employee the parent!)