Comp 353 final-exam study guide selected solutions

Here are solutions to some of the sample problems:

Ch 15: 15.21
LOTS would be in 2NF if we only took the primary key into account; every relation with a single-attribute primary key is in 2NF if we only take the primary key into account. It is not in 3NF; there are other nontrivial dependencies.

15.24*

R=ABCDEFGHIJ and AB⟶C, A⟶DE, B⟶F, F⟶GH, D⟶IJ
Determining the key is nontrivial in principle, but AB determines everything (A determines DE, and hence IJ; B determines F and hence GH)
To make the relation 2NF, we must factor out A⟶DE and B⟶F, as these are dependencies on a key subset. We get:
    ABCGHIJ, ADE, BF
But we probably should have done F⟶GH and D⟶IJ first, as these are "downstream" (in the path of the ⟶⟶). This gives
    ABC, ADE, BF, DIJ, FGH
which looks to me like 3NF.


15.25
Now we have AB⟶C, BD⟶EF, AD⟶GH, A⟶I, and H⟶J. Everything can be determined from ABD. Furthermore, none of A, B, or D appears on the RHS of a functional dependency, so A, B, and D are not determined by anything; the relation is
    ABDCEFGHIJ

Starting downstream again, we get tables
    HJ, AI, ADGH, BDEF, ABC, and then the master table ABD. With no dependencies at all!

15.27
The relation is ABCDE, and the dependencies are AB⟶C, CD⟶E, DE⟶B. From the first two, we can determine that ABD determines C and E, and so is a key. From the third, we see ADE is also a key. Given AB, we can get C, but we can't get D because it's not on the RHS of any dependency, and we can't get E because the only dependency giving E requires D.


Ch 18: 18.18abcdefgh
(a) 30+9+9+40+10+8+1+4+4+1 = 116
(b) we can fit four of these in a 512-byte block, so EMPLOYEE has 30,000/4 = 7500 blocks.
(c) This is the index as in Fig 18.1. Each index entry is 9+6=15 bytes (6 for a block pointer), and we can get 34 of these in a block. One index entry is needed per block, so we have 7500/34 = 221 data blocks for the index.
(d) If the file is not ordered by ssn, we need an index entry for every record, not just every block. See Fig 18.4. We now have 30,000/34 = 883 index blocks.
(e) this is not on the exam. I swear.


Ch 19: 19.14
We can sort 64 blocks at a time. 4096/64 = 64 = 26; it will take 6 merge passes to finish the sort.


Ch 21: 21.16, identify recoverable and cascadeless only, 21.17,
21.22
(a) not serializable; T3 does read(x) before T1 does write(x), but T1 does read(x) before T3 does write(x)
(b) not serializable; ditto
(c) serializable as T2, T3, T1
(d) not serializable; T3 reads x before T1 writes it, so T3⟶T1, but also the reverse: T1⟶T3.

1. (a) Insert these into an empty B-tree of order 5 (max 4 values / 5 pointers per block):

       1   2   3   4   5   6  7  8  9  10  11  12  13  14  15  16   17  18  19  20

There are three levels. The root block contains 9; the next-level blocks are [3,6] and [12,15,18] and the leaf blocks are:
    [1,2]  [4,5]  [7,8]  [10,11]  [13,14]  [16,17]  [19,20]

(b). Now insert these values into an empty B-tree in the following order:

       1  20  3  18  5  16  7  14  9  12  11  10 13  8  15  6  17  4  19  2

We start by inserting [1,3,18,20] at the root node. Inserting 5 causes a push-up split: we get root [5] and subnodes [1,3] and [18,20].
We then insert 16 and 7 in the right subnode. Now insertion of 14 causes a push-up split: the root becomes [5,16] with the next level blocks [1,3], [7,14] and [18,20].
Now we insert 9 and 12 into the [7,14] block, and then comes 11 forcing a push-up split. The root is now[5,11,16] with sublevels [1,3], [7,9], [12,14] and [18,20].
Insertions of 10,13,8 and 15 leave us with sublevels [1,3], [7,8,9,10], [12,13,14,15] and [18,20].
Inserting 6 causes a push-up split on the second block: the root becomes [5,8,11,16] with sublevels [1,3], [6,7], [9,10], [12,13,14,15] and [18,20].
We can now add 17, 4, 19 and 2; these cause no push-up splits. The sublevels are now [1,2,3,4], [6,7], [9,10], [12,13,14,15] and [17,18,19,20]
   
There's a nice animated applet illustrating this at http://slady.net/java/bt/view.php. In the applet, the author refers to a tree with up to 4 values / 5 pointers per block as order 2, where EN calls this order 4. Note that the order using the EN nomenclature must always be even.

2(a). Suppose we have 2 records per block, and we want to build an ISAM index that has 5 entries per index block. Give the index, assuming the file is sorted by the key field and the values for the key field are the integers from 1 to 40.

See Fig 18.6. The first level of the index needs an entry for the anchor record of every block; that is, for the twenty key values 1,3,5,7,9,11,13,15,...,37,39. This takes four blocks, because we have 5 entries per index block. The second-level index thus has four entries, and thus fits in a single block.

3. Suppose we are doing the join of two files A and B; the join condition is A.a = B.b. File A has 10,000 records and B has 20,000 records. Each file has 10 records/block. A is sorted by field a; looking up a value requires ~10 = log2(1000) block lookups using binary search. B has a hash index; finding a record takes on average 7 block accesses. B is not sorted by attribute b.

(a) Give an estimate of the cost of the join, if we use the approach
       for each a in A
          lookup a in B using the hash index

It takes 1000 block accesses to go through each a in A. There are 10000 records in A. Each requires a  lookup in B, and each lookup takes 7 accesses, for a total of 1000 + 70,000

(b) Give an estimate of the cost of the join if we use the approach
       for each b in B
          look up b in A using binary search

This way, there are 2000 block accesses and 20,000 lookups, and each lookup takes 10 block accesses:
    2000 + 200,000

4. (a) Here are transaction schedules for T1 and T2. Is this recoverable? If so, why? If not, how could it be fixed? Recoverability should be the case whether the commit/abort is a commit (if success) or abort (if failure).

T1
T2
read(x)
write(x)


read(x)
write(x)
read(y)
write(y)
commit/abort


read(z)
write(z)
commit/abort

This is recoverable; if T1 commits and T2 aborts, we do not have to undo anything.

(b). Here are schedules for T1, T2, T3. Who reads from whom? Who precedes whom in the precedence graph?

T1
T2
T3
read(x)
write(x)



read(x)



read(x)
write(x)

write(x)

T2 and T3 read from T1.
T2 and T3 form a cycle in the precedence graph.

(c). Create a schedule of two transactions T1 and T2 that contains a conflict. If you use the tabular style, above, be careful to show (eg with clear horizontal lines) when control switches between T1 and T2.

T2 and T3 above have a lost-update conflict.