Comp 305/488 Database Administration final exam study guide

Comp 305/488: Database Administration final-exam study guide

The exam will not be open book. Instead, I will give you the notes pages here (~8 page sides if printed normal-sized), and you will also be allowed to bring up to two page sides of your own notes.

Selected solutions will be available on Sakai by April 24.

Topics

SQL
Normalization (ch 15): 2NF, 3NF, BCNF
Hash indexes (17.8)
Indexing, ISAM and B-Tree indexes (ch 18: 18.1, 18.2, 18.3.1)
Familiarity with per-record and per-block cost estimates
SQL algorithms and query optimization (19.3, 19.7)
Indexes and query efficiency
Transactions: ch 21 (esp 21.1, 21.4, 21.5), ch 22.1
MySQL explain
Locking (22.1 only)

Specific examples of NoSQL approaches will not be on the final exam.

Sample problems

Here are some sample problems. It is likely at least one problem will involve writing SQL queries based on some database (like Company) that will be given to you.

Book exercises

Ch 15: 15.21, 15.24*, 15.25, 15.27, 15.28, 15.31

Ch 18: 18.18abcdefgh

Ch 19: 19.13a, 19.14

Ch 21: 21.16, identify recoverable and cascadeless only, 21.17, 21.22

Ch 22: 22.20 (hard?), 22.27

Additional problems by me

1. (a) Insert these into an empty B-tree of order 5 (max 4 values / 5 pointers per block):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

(b) Now insert these values into an empty B-tree in the following order:

1 20 3 18 5 16 7 14 9 12 11 10 13 8 15 6 17 4 19 2

2(a). Suppose we have 2 records per block, and we want to build an ISAM index that has 5 entries per index block. Give the index, assuming the file is sorted by the key field and the values for the key field are the integers from 1 to 40.

3. Suppose we are doing the join of two files A and B; the join condition is A.a = B.b. File A has 10,000 records and B has 20,000 records. Each file has 10 records/block. A is sorted by field a; looking up a value requires ~10 = log₂(1000) block lookups using binary search. B has a hash index; finding a record takes on average 7 block accesses. B is not sorted by attribute b.

(a) Give an estimate of the cost of the join, if we use the approach
for each a in A
lookup a in B using the hash index

(b) Give an estimate of the cost of the join if we use the approach
for each b in B
look up b in A using binary search

4. (a) Here are transaction schedules for T1 and T2. Is this "recoverable"? If so, why? If not, how could it be fixed? Recoverability should be the case whether the commit/abort is a commit (if success) or abort (if failure).

T1	T2
read(x) write(x)
	read(x) write(x)
read(y) write(y) commit/abort
	read(z) write(z) commit/abort

(b). Here are schedules for T1, T2, T3. Who reads from whom? Who precedes whom in the precedence graph?

T1	T2	T3
read(x) write(x)
	read(x)
		read(x) write(x)
	write(x)

(c). Create a schedule of two transactions T1 and T2 that contains a conflict. If you use the tabular style, above, be careful to show (eg with clear horizontal lines) when control switches between T1 and T2.

5. Suppose we have tables

Invoice consisting of ⟨cust_id, invoice_id⟩
Part consisting of ⟨partnum,cost⟩
InvItems consisting of ⟨invoice_id,partnum,quantity⟩.

In each case the key fields are underlined.

Suppose you want to find out the parts on a specific invoice, eg 100678.

select partnum from InvItems where invoice_id = '100678';

(a). What indexes on what tables may help with this?

(b). Can an index on InvItems.partnum help with this query? If so, how? If not, why not?

(c). Suppose you want to find out the total cost for a specific invoice_id, eg invoice_id = '100678'. Write the query, and identify the useful indexes.

(d). Suppose that, in order to track down the source of some unfortunate past confusion, you want to find all invoices in which the cust_id is numerically equal to one of the partnum values:

select i.cust_id, i.invoice_id from Invoice i join InvItems ii where i.invoice_id = ii.invoice_id and i.cust_id = ii.partnum;

What indexes might be useful here? Note that there are two separate join conditions.

6. Suppose we have the following tables:

Student consisting of ⟨student_id,name,major_id⟩
Grade1 consisting of ⟨student_id,score1⟩ for scores on exam 1
Grade2 consisting of ⟨student_id,score2⟩ for scores on exam 2

(a). Describe how an index on Student.student_id would allow a fast join for adding up all the score1 values of students whose major_id is 357.

select sum(g1.score1) from student s, grade1 g1 where s.student_id = g1.student_id and s.major_id = 357;

(b). Describe how an index on Grade1.student_id would allow a fast join for this same query

(c). Describe how an index on Student.major_id could be used for this query.

(d). Now suppose the question is to find all students who got the same score on exam 1 and exam 2. Write the query. In order to do this query reasonably efficiently, do we need an index on both Grade1.student_id and Grade2.student_id? If not, which index is necessary, or will either index work?

7. Suppose we have a triple-join as follows:

select * from employee e, works_on w, department d
where e.ssn = w.essn and e.dno = d.dnumber;

There are two orders for performing the joins.

(a). Assuming we have 25,000 employees, 100,000 works_on records and 100 departments, which join order is likely to be more efficient, and why? (Note: the difference is not large; the answer is not terribly clear-cut.)

(b). Assuming we have a lot more departments, does the order change for the most-efficient join sequence? If so, why; if not, why not.

8. Suppose the transactions of the tables in 4a and 4b were to use two-phase locking (which will change the actual schedules by introducing delays). For each set of transactions, identify the points where the lock and unlock requests are made. At some of these points, a transaction may now have to wait for the requested lock to be granted; identify when these waits start and when they end (and thus the waiting transaction can continue).