The exam will not be open
book. Instead, I will give you the notes
pages here (~8 page sides if printed normal-sized), and you will
also be allowed to bring up to two page sides of your
own notes.
Selected solutions will be available on Sakai by April 24.
Specific examples of NoSQL approaches will not be on the final exam.
Ch 18: 18.18abcdefgh
Ch 19: 19.13a, 19.14
Ch 21: 21.16, identify recoverable and cascadeless only, 21.17, 21.22
Ch 22: 22.20 (hard?), 22.27
1. (a) Insert these into an empty B-tree of order 5 (max 4 values / 5
pointers per block):
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20
(b) Now insert these values into an empty B-tree in the following order:
1 20 3 18 5
16 7 14 9 12 11 10 13 8
15 6 17 4 19 2
2(a). Suppose we have 2 records per block, and we want to build an ISAM
index that has 5 entries per index block. Give the index, assuming the
file is sorted by the key field and the values for the key field are the
integers from 1 to 40.
3. Suppose we are doing the join of two files A and B; the join condition
is A.a = B.b. File A has 10,000 records and B has 20,000 records. Each
file has 10 records/block. A is sorted by field a; looking up a value
requires ~10 = log2(1000) block lookups using binary search. B
has a hash index; finding a record takes on average 7 block accesses. B is
not sorted by attribute b.
(a) Give an estimate of the cost of the join, if we use the approach
for each a in A
lookup a in B using the hash
index
(b) Give an estimate of the cost of the join if we use the approach
for each b in B
look up b in A using binary
search
4. (a) Here are transaction schedules for T1 and T2. Is this
"recoverable"? If so, why? If not, how could it be fixed? Recoverability
should be the case whether the commit/abort is a commit (if success) or
abort (if failure).
T1 |
T2 |
read(x) write(x) |
|
read(x) write(x) |
|
read(y) write(y) commit/abort |
|
read(z) write(z) commit/abort |
(b). Here are schedules for T1, T2, T3. Who reads from whom? Who precedes whom in the precedence graph?
T1 |
T2 |
T3 |
read(x) write(x) |
||
read(x) |
||
read(x) write(x) |
||
write(x) |
(c). Create a schedule of two transactions T1 and T2 that contains a
conflict. If you use the tabular style, above, be careful to show (eg with
clear horizontal lines) when control switches between T1 and T2.
5. Suppose we have tables
In each case the key fields are underlined.
Suppose you want to find out the parts on a specific invoice, eg 100678.
select partnum from InvItems where invoice_id = '100678';
(a). What indexes on what tables may help with this?
(b). Can an index on InvItems.partnum help with this query? If so, how? If not, why not?
(c). Suppose you want to find out the total cost for a specific invoice_id, eg invoice_id = '100678'. Write the query, and identify the useful indexes.
(d). Suppose that, in order to track down the source of some unfortunate past confusion, you want to find all invoices in which the cust_id is numerically equal to one of the partnum values:
select i.cust_id, i.invoice_id from Invoice i join InvItems ii where i.invoice_id = ii.invoice_id and i.cust_id = ii.partnum;
What indexes might be useful here? Note that there are two separate join conditions.
6. Suppose we have the following tables:
(a). Describe how an index on Student.student_id would allow a fast join for adding up all the score1 values of students whose major_id is 357.
select sum(g1.score1) from student s, grade1 g1 where s.student_id = g1.student_id and s.major_id = 357;
(b). Describe how an index on Grade1.student_id would allow a fast join for this same query
(c). Describe how an index on Student.major_id could be used for this query.
(d). Now suppose the question is to find all students who got the same score on exam 1 and exam 2. Write the query. In order to do this query reasonably efficiently, do we need an index on both Grade1.student_id and Grade2.student_id? If not, which index is necessary, or will either index work?
7. Suppose we have a triple-join as follows:
select * from employee e, works_on w,
department d
where e.ssn = w.essn and e.dno = d.dnumber;
There are two orders for performing the joins.
(a). Assuming we have 25,000 employees, 100,000 works_on records and 100 departments, which join order is likely to be more efficient, and why? (Note: the difference is not large; the answer is not terribly clear-cut.)