Ch 18: 18.18abcdefgh
(a) 30+9+9+40+10+8+1+4+4+1 = 116
(b) we can fit four of these in a 512-byte block, so EMPLOYEE has 30,000/4 = 7500 blocks.
(c) This is the index as in Fig 18.1. Each index entry is 9+6=15 bytes
(6 for a block pointer), and we can get 34 of these in a block. One
index entry is needed per block, so we have 7500/34 = 221 data blocks
for the index.
(d) If the file is not ordered by ssn, we need an index entry for every record, not just every block. See Fig 18.4. We now have 30,000/34 = 883 index blocks.
(e) this is not on the exam. I swear.
Ch 19: 19.14
We can sort 64 blocks at a time. 4096/64 = 64 = 26; it will take 6 merge passes to finish the sort.
Ch 21: 21.16, identify recoverable and cascadeless only, 21.17,
21.22
(a) not serializable; T3 does read(x) before T1 does write(x), but T1 does read(x) before T3 does write(x)
(b) not serializable; ditto
(c) serializable as T2, T3, T1
(d) not serializable; T3 reads x before T1 writes it, so T3⟶T1, but also the reverse: T1⟶T3.
1. (a) Insert these into an empty B-tree of order 5 (max 4 values / 5 pointers per block):
1 2
3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20
(b). Now insert these values into an empty B-tree in the following order:
1 20 3 18
5 16 7 14 9 12 11 10 13
8 15 6 17 4 19 2
2(a). Suppose we have 2 records per block, and we want to build an ISAM
index that has 5 entries per index block. Give the index, assuming the
file is sorted by the key field and the values for the key field are
the integers from 1 to 40.
See Fig 18.6. The first level of the index needs an entry for the
anchor record of every block; that is, for the twenty key values
1,3,5,7,9,11,13,15,...,37,39. This takes four blocks, because we have 5
entries per index block. The second-level index thus has four entries,
and thus fits in a single block.
3. Suppose we are doing the join of two files A and B; the join
condition is A.a = B.b. File A has 10,000 records and B has 20,000
records. Each file has 10 records/block. A is sorted by field a;
looking up a value requires ~10 = log2(1000)
block lookups using binary search. B has a hash index; finding a record
takes on average 7 block accesses. B is not sorted by attribute b.
(a) Give an estimate of the cost of the join, if we use the approach
for each a in A
lookup a in B using the hash index
It takes 1000 block accesses to go through each a in A. There are 10000 records in A. Each requires a lookup in B, and each lookup takes 7 accesses, for a total of 1000 + 70,000
(b) Give an estimate of the cost of the join if we use the approach
for each b in B
look up b in A using binary search
This way, there are 2000 block accesses and 20,000 lookups, and each lookup takes 10 block accesses:
2000 + 200,000
4. (a) Here are transaction schedules for T1 and T2. Is this
recoverable? If so, why? If not, how could it be fixed? Recoverability
should be the case whether the commit/abort is a commit (if success) or
abort (if failure).
T1 |
T2 |
read(x) write(x) |
|
read(x) write(x) |
|
read(y) write(y) commit/abort |
|
read(z) write(z) commit/abort |
This is recoverable; if T1 commits and T2 aborts, we do not have to undo anything.
(b). Here are schedules for T1, T2, T3. Who reads from whom? Who precedes whom in the precedence graph?
T1 |
T2 |
T3 |
read(x) write(x) |
||
read(x) |
||
read(x) write(x) |
||
write(x) |
T2 and T3 read from T1.
T2 and T3 form a cycle in the precedence graph.
(c). Create a schedule of two transactions T1 and T2 that contains a
conflict. If you use the tabular style, above, be careful to show (eg
with clear horizontal lines) when control switches between T1 and T2.
T2 and T3 above have a lost-update conflict.