Comp 305-001/488-305: Database Administration
Spring 2016: Tuesdays, 4:15-6:45, Corboy L08
Week 2
Read in Elmasri & Navathe (EN). Chapters are labeled (7th edition)/(6th
edition)
- Chapter 1/1, Databases and Database Users (skim)
- Chapter 2/2, Database System
Concepts an Architecture, Section 1 on Data Models, Schemas and
Instances
- Chapter 5/3, The Relational Data Model and SQL Sections 1
& 2
- Chapter 6/4, Basic SQL
A few more basic concepts
Database tables are usually "self-describing", in that the table description
and/or relationships to other tables is often embedded in the table
description. At a minimum, columns have names and types.
The table definitions, column names and their types are generally known,
collectively, as the database schema.
Database tables can be changed without the need to recompile programs using
that table: new columns can be added, or entire new tables. Essentially this
is because DBMS queries are interpreted,
and table columns are identified by name
rather than offset. (C programs that access record fields do
need recompilation if fields are changed.)
Databases often support different views,
perhaps for users with different privilege levels. A view may be a subset of
the original set of columns, or it may contain some computed columns in lieu
of the original columns. Excluding columns such as social_security_num
or salary is relatively common.
Transactions are sets of related updates, eg removing money
from one account and adding it to another, or perhaps dropping one class and
adding another (sadly, remarkably many student DBMSs lack the latter
transaction operation). EN defines a transaction to be the result of an
executing program rather than a set of related updates; this latter
definition might include adding multiple new rows to one table.
Transaction processing must satisfy the isolation
property, that transactions appear to execute in isolation from one
another, and the atomicity
property, which says that each transaction is completely executed
or not executed at all. (This is sometimes described as that transactions
must meet the ACID test: Atomicity, Consistency, Isolation, Durability.
However, consistency and durability are relevant to all database updates;
they are not particular to transactions.)
Here is a simple failure of isolation: two transactions are transferring
money to other accounts.
- Transaction 1 is transferring $100 from A to B
- Transaction 2 is transferring $50 from A to C
The total amount of money should always remain $1000.
transaction 1
|
transaction 2
|
Acct A
|
Acct B
|
Acct C
|
|
|
1000
|
0
|
0
|
get value of A:
$1000
|
|
1000
|
0
|
0
|
add $100 to B
|
|
1000
|
100
|
0
|
|
Debit A by
$50
|
950
|
100
|
0
|
Store $1000 - $100 in A
|
|
900
|
100
|
0
|
|
Credit C by $50
|
900
|
100
|
50
|
Multi-user DBMSs need concurrency control.
Concurrency control leads to locks, the other great
performance bottleneck of RDBMSs.
Section 1.6 of EN talks about some of the advantages of having a central
DBMS rather than individual DBMSs maintained by each administrative group.
This is pretty much a settled issue now, though it does mean that the
"natural owners" of data in an organization (eg registration and records at
Loyola, for student registration data) will not in fact own that data.
The COMPANY database
Schema: EN7 p 161 / EN6 p 71
Data: EN7 p 162 / EN6 p 72
basic
table definitions, with all foreign-key constraints
table
definitions plus data, with ALTER (some FK constraints are added
later, to allow initial data loading)
Spreadsheet
zip file
pdf view of
tables
The tables are (with primary key in bold):
employee: name,
ssn, bdate, address, sex, salary, super_ssn, dno
department: dname, dnumber,
mgr_ssn, mgr_start
dept_locations: dnumber, dlocation
project:
pname, pnumber, plocation, dnum
works_on: essn,
pno, hours
dependent: essn,
dependent_name, sex, bdate, relationship
The university
database is here (EN7 p 8 / EN6 p 8)
The university database here has all its foreign-key constraints. There is
no constraint "circularity", so this should not be a problem. I did give
names to the university FK constraints.
(brief review of create table)
Loading the database
If you have a command-line window, and want to load up a file of SQL
statements (say company.alter.text),
- use "cd" in the shell window to move to the directory where your files
are located
- Start Postgres (eg with psql
-U myname) or MySQL (eg with \mysql\bin\mysql
-u myname -p)
- After you've connected to the right database, type the following:
- Postgres: \i company.alter.text
- mysql: source company.alter.text;
Alternatively, you can paste the entire file into a command
window (you will probably need the menu paste command, as CNTL-V is likely
to mean something else). It helps if there are no tab characters in the
file.
Core relational-database concept:
Divide data into multiple tables
(mathematically, RELATIONS) in such a way that there is enough
division to enforce consistency and not too much
division to cause problems with reassembly
See the sql1 notes on implicit constraints
Major concepts for this course (references are to Parts of EN)
- The basic relational-DB model, and SQL (part 2)
- DB modeling (part 3)
- Normalization & other design theory (part 6)
- internal DB structures (part 7)
- optimization & tuning (part 8)
- concurrency, locking, and transactions (part 9)
We will also cover a fair bit of DB administration, though that isn't
always so directly in EN.
Relational DBs (EN7 chapter 5 / EN6 chapter 3)
A relation is any set of tuples
The set of all possible tuples is the CROSS PRODUCT of some domains
col1 × col2 × col3 × ... × colN
Example: A = {1,2,3}, B = {x,y} C = {1,2}
A × B
A ×
C
< relation in A × C
<= relation in A × C
DB relations are not defined by rule, but by tabulation!
Given attribute sets A1,
A2, ..., An, a relation
is a subset of the cartesian product A1×A2×...×An;
that is, a set of tuples ⟨a1,a2,...,an⟩
where each ai∈Ai. These tuples may also be called records.
Relations in a DB are represented as tables.
EN also uses the term relation state
to refer to a specific set of records in a table.
STUDENT table, EN p 63
Name
|
SSn
|
Home_phone
|
Address
|
Office_phone
|
Age
|
GPA
|
Benjamine Bayer
|
305-61-2435
|
817-373-1616
|
2918 bluebonnet Lane
|
NULL
|
19
|
3.21
|
Chung-cha Kim
|
381-62-1245
|
817-375-4409
|
125 Kirby Road
|
NULL
|
18
|
2.89
|
Dick Davidson
|
422-11-2320
|
NULL
|
3452 Elgin Road
|
817-749-1253
|
25
|
3.53
|
Rohan Panchal
|
489-22-1100
|
817-376-9821
|
265 Lark Lane
|
817-749-6492
|
28
|
3.93
|
Barbara Benson
|
533-69-1238
|
817-839-8461
|
7384 Fontana Lane
|
NULL
|
19
|
3.25
|
Note the
Also note that some entries are NULL. This means undefined
or not available or not
known; unfortunately, these three options are not
synonymous or interchangeable. NULL values are essential, but they do
introduce some complications. The first is that records with NULL entries
are not in fact elements of A1×A2×...×An;
they are elements of
(A1 ∪ {NULL}) × (A2 ∪ {NULL}) × ...
× (An ∪ {NULL})
EN also gives an alternative definition of a relation, as a set of maps
from the attribute set to the set of attribute values, where the attribute
set is essentially the set of names
of columns. With this approach, a null entry is represented by a partial
map, undefined for some attributes.
Note that we must be careful when comparing null values: if two people have
NULL as their Office_phone, it does not
mean they have the same phone! Worse, we simply do not know if the NULL
means we don't know their phone, or if they simply do not have one, or if
they have no office at all and so the "office_phone" is irrelevant.
Joins
As we saw briefly last week, the join
is the operation of creating all records merged from two (or more) tables,
where one attribute of one table is required to match a corresponding
attribute of another. Usually, but not always, the column-matching is based
on equality of corresponding attributes.
Examples:
University:
- Listing all students in Section 112 (my data)
- Printing all of each student's grades, by joining the Student_number
fields of STUDENT and GRADE_REPORT
- Printing all sections including Course_name, joining COURSE and
SECTION on the Course_number field
The first example we did last week as follows:
select s.name from student s, grade_report
gr
where s.student_number =
gr.student_number and gr.section_identifier = 112;
Company:
- Printing the name and address of all employees who work in the
'Research' dept (Query1 on EN p 100; uses employee and department
tables)
- Printing the project number, dept number, and the dept manager's name,
for all projects located in 'Stafford' (Query 2 on EN p 100; uses
project and department tables)
- Printing each employee's name and his or her supervisor's name (Query
8, EN p 101; uses employee table joined to itself)
Demos of these
A full Cartesian product would be denoted in SQL by, eg,
select * from employee, department;
where there is no WHERE clause establishing a relation between the two
tables.
The join is conceptually somewhat inefficient. Lots of
behind-the-scenes optimization makes it fast.
More on keys
A KEY is any set of columns that is guaranteed to uniquely determine a row.
Primary Key: the key the database developer thinks is most important;
usually a single attribute if there is one
Composite Key: multiple columns (eg the GRADE_REPORT table). Note that there
is no single-column key here.
Secondary Keys: other column combinations that are keys, but not the one
intended
Note that keys are not properties of particular tables, but rather of the
"table schema". They represent design constraints.
Foreign Keys
Key constraints are one kind of constraint. What about the use of dno
in table Employees? It should be clear that we probably want all dnos to
refer to real departments, that is, to match an existing dnumber
in table Department. This is done through a foreign
key constraint: we declare in table Employee that attribute dno is
a foreign key: a reference to a key
of another table. The declaration looks like
foreign key (dno) references
department(dnumber)
We can also give this constraint a name:
constraint FK_employee_department
foreign key (dno) references department(dnumber)
(This is a simpler naming convention from the earlier example; only the
parent table name is given.) Note that the constraint here applies to adding
(or updating) records in Employee, and also to deleting records in
Department.
Foreign keys are notorious for introducing circularity problems. What
happens if we enforce foreign keys and try to load the COMPANY database as
originally written? With all tables empty, we can't add any employee because
no dno value we might use would appear in the empty Department table, and we
cannot add a department because the mgr_ssn is a foreign key referencing
Employee.
In principle, there is no reason to require that the foreign key actually be
a key in the other table. In practice, it almost always is; in database
schemas generated through so-called Entity-Relationship diagrams it always
is.
Life can be quite frustrating if you forget the circularity problem.
Once two tables with a "foreign-key embrace" (each uses the other as a
foreign key) are created, they can be difficult to remove. Sometimes one has
to resort to dropping the entire database. If I load my file
company.brokenalter.text, these all fail:
- drop table employee;
- drop table department;
- alter table employee drop foreign key dno;
The last one above, however, fails simply because it is wrong;
I shouldn't have used the column name (dno), but rather the constraint
name (in this case, department_ibfk_1). Some people like foreign-key
constraint names for this reason.
To drop table T, you must first drop all foreign key constraints from other
tables to T.
The command
- alter table department drop foreign key department_ibfk_1
does work. The constraint name can be determined from show
create table department.
Another thing that does work (though only for MySQL) is this:
- set foreign_key_checks=0;
Here is an insert command that should fail due to a foreign-key
violation, as there is no department 6 (the delete command right after
undoes the addition):
insert into employee values ('ralph', null,
'wiggums', '121212121', null, null, null, null, null, 6);
delete from employee where lname = 'wiggums';
If this succeeds, the employee table probably has foreign key constraints
removed. You can see the constraints
with the MySQL command
show create table employee;
They can be added back with:
alter table employee ADD foreign key
(super_ssn) references employee(ssn);
alter table employee ADD foreign key (dno) references department(dnumber);
They can be added back and given names with the following:
alter table employee ADD constraint
FK_employee_employee foreign key (super_ssn) references employee(ssn);
alter table employee ADD constraint FK_employee_department foreign key
(dno) references department(dnumber);
(The naming convention here is FK_childtable_parenttable. It is common, but
not universal; some add the referenced column in the parent table as well.)
The FK declaration goes into the child table, and includes
a reference to a parent table: some column of the child
table is restricted to values that appear in the designated column of the
parent table. That is, with the second FK constraint above, involving dno,
table EMPLOYEE is the child table and table DEPARTMENT is the parent table.
(Of course, there is a different FK constraint, on DEPARTMENT.mgr_ssn,
making department the child and employee the parent!)
Other constraints
Examples might be that the employee salary is in a given range, or is less
than the supervisor's salary, etc. These non-key constraints can sometimes
be addressed at the DB level, but are often easier to address at the level
of the user interface to the DB; that is, the web interface can contain the
necessary business logic.