Database week 2

Comp 305-001/488-305: Database Administration

Spring 2016: Tuesdays, 4:15-6:45, Corboy L08

Week 2

Read in Elmasri & Navathe (EN). Chapters are labeled (7th edition)/(6th edition)

Chapter 1/1, Databases and Database Users (skim)
Chapter 2/2, Database System Concepts an Architecture, Section 1 on Data Models, Schemas and Instances
Chapter 5/3, The Relational Data Model and SQL Sections 1 & 2
Chapter 6/4, Basic SQL

A few more basic concepts

Database tables are usually "self-describing", in that the table description and/or relationships to other tables is often embedded in the table description. At a minimum, columns have names and types.

The table definitions, column names and their types are generally known, collectively, as the database schema.

Database tables can be changed without the need to recompile programs using that table: new columns can be added, or entire new tables. Essentially this is because DBMS queries are interpreted, and table columns are identified by name rather than offset. (C programs that access record fields do need recompilation if fields are changed.)

Databases often support different views, perhaps for users with different privilege levels. A view may be a subset of the original set of columns, or it may contain some computed columns in lieu of the original columns. Excluding columns such as social_security_num or salary is relatively common.

Transactions are sets of related updates, eg removing money from one account and adding it to another, or perhaps dropping one class and adding another (sadly, remarkably many student DBMSs lack the latter transaction operation). EN defines a transaction to be the result of an executing program rather than a set of related updates; this latter definition might include adding multiple new rows to one table.

Transaction processing must satisfy the isolation property, that transactions appear to execute in isolation from one another, and the atomicity property, which says that each transaction is completely executed or not executed at all. (This is sometimes described as that transactions must meet the ACID test: Atomicity, Consistency, Isolation, Durability. However, consistency and durability are relevant to all database updates; they are not particular to transactions.)

Here is a simple failure of isolation: two transactions are transferring money to other accounts.

Transaction 1 is transferring $100 from A to B
Transaction 2 is transferring $50 from A to C

The total amount of money should always remain $1000.

transaction 1	transaction 2	Acct A	Acct B	Acct C
		1000	0	0
get value of A: $1000		1000	0	0
add $100 to B		1000	100	0
	Debit A by $50	950	100	0
Store $1000 - $100 in A		900	100	0
	Credit C by $50	900	100	50

Multi-user DBMSs need concurrency control.

Concurrency control leads to locks, the other great performance bottleneck of RDBMSs.

Section 1.6 of EN talks about some of the advantages of having a central DBMS rather than individual DBMSs maintained by each administrative group. This is pretty much a settled issue now, though it does mean that the "natural owners" of data in an organization (eg registration and records at Loyola, for student registration data) will not in fact own that data.

The COMPANY database

Schema: EN7 p 161 / EN6 p 71
Data: EN7 p 162 / EN6 p 72
basic table definitions, with all foreign-key constraints
table definitions plus data, with ALTER (some FK constraints are added later, to allow initial data loading)
Spreadsheet
zip file
pdf view of tables

The tables are (with primary key in bold):
    employee:       name, ssn, bdate, address, sex, salary, super_ssn, dno
    department:     dname, dnumber, mgr_ssn, mgr_start
    dept_locations: dnumber, dlocation
    project:             pname, pnumber, plocation, dnum
    works_on:        essn, pno, hours
    dependent:       essn, dependent_name, sex, bdate, relationship

The university database is here (EN7 p 8 / EN6 p 8)

The university database here has all its foreign-key constraints. There is no constraint "circularity", so this should not be a problem. I did give names to the university FK constraints.

(brief review of create table)

Loading the database

If you have a command-line window, and want to load up a file of SQL statements (say company.alter.text),

use "cd" in the shell window to move to the directory where your files are located
Start Postgres (eg with psql -U myname) or MySQL (eg with \mysql\bin\mysql -u myname -p)
After you've connected to the right database, type the following:

Postgres: \i company.alter.text
mysql: source company.alter.text;

Alternatively, you can paste the entire file into a command window (you will probably need the menu paste command, as CNTL-V is likely to mean something else). It helps if there are no tab characters in the file.

Core relational-database concept:

Divide data into multiple tables (mathematically, RELATIONS) in such a way that there is enough division to enforce consistency and not too much division to cause problems with reassembly

See the sql1 notes on implicit constraints

Major concepts for this course (references are to Parts of EN)

The basic relational-DB model, and SQL (part 2)
DB modeling (part 3)
Normalization & other design theory (part 6)
internal DB structures (part 7)
optimization & tuning (part 8)
concurrency, locking, and transactions (part 9)

We will also cover a fair bit of DB administration, though that isn't always so directly in EN.

Relational DBs (EN7 chapter 5 / EN6 chapter 3)

A relation is any set of tuples
The set of all possible tuples is the CROSS PRODUCT of some domains
        col1 × col2 × col3 × ... × colN
Example: A = {1,2,3}, B = {x,y} C = {1,2}
        A × B
        A × C
        < relation in A × C
        <= relation in A × C

DB relations are not defined by rule, but by tabulation!

Given attribute sets A₁, A₂, ..., A_n, a relation is a subset of the cartesian product A₁×A₂×...×A_n; that is, a set of tuples ⟨a₁,a₂,...,a_n⟩ where each a_i∈A_i. These tuples may also be called records.

Relations in a DB are represented as tables. EN also uses the term relation state to refer to a specific set of records in a table.

STUDENT table, EN p 63

Name	SSn	Home_phone	Address	Office_phone	Age	GPA
Benjamine Bayer	305-61-2435	817-373-1616	2918 bluebonnet Lane	NULL	19	3.21
Chung-cha Kim	381-62-1245	817-375-4409	125 Kirby Road	NULL	18	2.89
Dick Davidson	422-11-2320	NULL	3452 Elgin Road	817-749-1253	25	3.53
Rohan Panchal	489-22-1100	817-376-9821	265 Lark Lane	817-749-6492	28	3.93
Barbara Benson	533-69-1238	817-839-8461	7384 Fontana Lane	NULL	19	3.25

Note the

attributes
tuples

Also note that some entries are NULL. This means undefined or not available or not known; unfortunately, these three options are not synonymous or interchangeable. NULL values are essential, but they do introduce some complications. The first is that records with NULL entries are not in fact elements of A₁×A₂×...×A_n; they are elements of
(A₁ ∪ {NULL}) × (A₂ ∪ {NULL}) × ... × (A_n ∪ {NULL})

EN also gives an alternative definition of a relation, as a set of maps from the attribute set to the set of attribute values, where the attribute set is essentially the set of names of columns. With this approach, a null entry is represented by a partial map, undefined for some attributes.

Note that we must be careful when comparing null values: if two people have NULL as their Office_phone, it does not mean they have the same phone! Worse, we simply do not know if the NULL means we don't know their phone, or if they simply do not have one, or if they have no office at all and so the "office_phone" is irrelevant.

Joins

As we saw briefly last week, the join is the operation of creating all records merged from two (or more) tables, where one attribute of one table is required to match a corresponding attribute of another. Usually, but not always, the column-matching is based on equality of corresponding attributes.

Examples:
University:

Listing all students in Section 112 (my data)
Printing all of each student's grades, by joining the Student_number fields of STUDENT and GRADE_REPORT
Printing all sections including Course_name, joining COURSE and SECTION on the Course_number field

The first example we did last week as follows:

select s.name from student s, grade_report gr
where s.student_number = gr.student_number and gr.section_identifier = 112;

Company:

Printing the name and address of all employees who work in the 'Research' dept (Query1 on EN p 100; uses employee and department tables)
Printing the project number, dept number, and the dept manager's name, for all projects located in 'Stafford' (Query 2 on EN p 100; uses project and department tables)
Printing each employee's name and his or her supervisor's name (Query 8, EN p 101; uses employee table joined to itself)

Demos of these

A full Cartesian product would be denoted in SQL by, eg,

select * from employee, department;

where there is no WHERE clause establishing a relation between the two tables.

The join is conceptually somewhat inefficient. Lots of behind-the-scenes optimization makes it fast.

More on keys

A KEY is any set of columns that is guaranteed to uniquely determine a row.
Primary Key: the key the database developer thinks is most important; usually a single attribute if there is one
Composite Key: multiple columns (eg the GRADE_REPORT table). Note that there is no single-column key here.

Secondary Keys: other column combinations that are keys, but not the one intended

Note that keys are not properties of particular tables, but rather of the "table schema". They represent design constraints.

Foreign Keys

Key constraints are one kind of constraint. What about the use of dno in table Employees? It should be clear that we probably want all dnos to refer to real departments, that is, to match an existing dnumber in table Department. This is done through a foreign key constraint: we declare in table Employee that attribute dno is a foreign key: a reference to a key of another table. The declaration looks like

foreign key (dno) references department(dnumber)

We can also give this constraint a name:

constraint FK_employee_department foreign key (dno) references department(dnumber)

(This is a simpler naming convention from the earlier example; only the parent table name is given.) Note that the constraint here applies to adding (or updating) records in Employee, and also to deleting records in Department.

Foreign keys are notorious for introducing circularity problems. What happens if we enforce foreign keys and try to load the COMPANY database as originally written? With all tables empty, we can't add any employee because no dno value we might use would appear in the empty Department table, and we cannot add a department because the mgr_ssn is a foreign key referencing Employee.

In principle, there is no reason to require that the foreign key actually be a key in the other table. In practice, it almost always is; in database schemas generated through so-called Entity-Relationship diagrams it always is.

Life can be quite frustrating if you forget the circularity problem. Once two tables with a "foreign-key embrace" (each uses the other as a foreign key) are created, they can be difficult to remove. Sometimes one has to resort to dropping the entire database. If I load my file company.brokenalter.text, these all fail:

drop table employee;
drop table department;
alter table employee drop foreign key dno;

The last one above, however, fails simply because it is wrong; I shouldn't have used the column name (dno), but rather the constraint name (in this case, department_ibfk_1). Some people like foreign-key constraint names for this reason.

To drop table T, you must first drop all foreign key constraints from other tables to T.

The command

alter table department drop foreign key department_ibfk_1

does work. The constraint name can be determined from show create table department.

Another thing that does work (though only for MySQL) is this:

set foreign_key_checks=0;

Here is an insert command that should fail due to a foreign-key violation, as there is no department 6 (the delete command right after undoes the addition):

insert into employee values ('ralph', null, 'wiggums', '121212121', null, null, null, null, null, 6);
delete from employee where lname = 'wiggums';

If this succeeds, the employee table probably has foreign key constraints removed. You can see the constraints with the MySQL command

show create table employee;

They can be added back with:

alter table employee ADD foreign key (super_ssn) references employee(ssn);
alter table employee ADD foreign key (dno) references department(dnumber);

They can be added back and given names with the following:

alter table employee ADD constraint FK_employee_employee foreign key (super_ssn) references employee(ssn);
alter table employee ADD constraint FK_employee_department foreign key (dno) references department(dnumber);

(The naming convention here is FK_childtable_parenttable. It is common, but not universal; some add the referenced column in the parent table as well.)

The FK declaration goes into the child table, and includes a reference to a parent table: some column of the child table is restricted to values that appear in the designated column of the parent table. That is, with the second FK constraint above, involving dno, table EMPLOYEE is the child table and table DEPARTMENT is the parent table. (Of course, there is a different FK constraint, on DEPARTMENT.mgr_ssn, making department the child and employee the parent!)

Other constraints

Examples might be that the employee salary is in a given range, or is less than the supervisor's salary, etc. These non-key constraints can sometimes be addressed at the DB level, but are often easier to address at the level of the user interface to the DB; that is, the web interface can contain the necessary business logic.