Database week 13

Comp 353/453: Database Programming, LT 410, 4:15 Tuesdays

Week 13

Tutoring: Faraz Khan, Wed 5:00-7:00 pm, fkhan10@luc.edu.

Transaction schedules

Transactions

A transaction is a set of operations, which we can idealize as read(X), write(X), which at the end we either commit (save) or rollback/abort (discard). Generally we hope rollbacks are rare.

Abstract transactions: operations are read(X) and write(X), for X a field/record/block/table.
Fig 21.2: two example transactions.
T1: read(X), write(X), read(Y), write(Y)
T2: read(X), write(X)

Ideally, we execute the transactions one at a time, with no overlap. In practice, because of the considerable I/O-wait for disk retrieval, we cannot afford that. Transactions will overlap. We need to figure out how to arrange for this to happen safely. Here are two problems:

Lost-update (write/write) problem: this occurs when one transaction writes a value that another then overwrites. See Fig 21.3a. T1 is transferring N dollars from X to Y, while T2 is adding M dollars to X. T2.write(X) overwrites T1.write(X).

T1	T2
read(X) X = X-N
	read(X) X = X+M
write(X) read(Y)
	write(X)
Y = Y+N write(Y)

Dirty-Read (read/write) problem: this occurs when one transaction reads an item that another then overwrites. This would end up as a lost-update, but consider Fig 21.3b where T1 actually aborts after write(X).

T1	T2
read(X) X = X-N write(X)
	read(X) X = X+M write(X)
read(Y) abort/rollback

The system log, or journal, is a master list of all operations performed, used in the event of rollback. Failures leading to rollback may be system crash, lack of disk space, lookup error (eg SQL error), locking problems, or other physical or logical errors.

Items read here can be individual record attributes, entire records, or entire tables. When joining tables, the entire table is read.

The ACID test

Atomicity: either all steps of a transaction are performed, or none are. We either commit the entire thing or rollback completely.
Consistency preservation: the individual transaction shoud preserve database consistency and database invariants
Isolation: transactions should appear to be executing in isolation.
Durability: committed transactions survive DB crashes

Note that Durability is arguably implied by Atomicity, or at least the two together are implied by some greater "recovery" rule.

Also note that Consistency is closely related to Isolation: if each transaction preserves consistency, and the transactions appear to execute in Isolation, then a series of transactions will be Consistency preserving. Consistency would be implied by Serializability: that the transactions appeared to have been executed in some serial order. However, this might be a stronger condition, if we really had a good definition for isolation.

Moral: beware of choosing your terms to fit a clever acronym.

Transaction schedules

Transaction schedules are lists of read/write operations of each transaction, plus commit/abort. The operations of each individual transaction must be in their original order.

Eventually we will enforce schedules by using locking, but for now we just consider schedules abstractly.

Examples:

Sa: Fig 21.3a: r1(X), r2(X), w1(X), r1(Y), w2(X), w1(Y), c1, c2
Sb: Fig 21.3b: r1(X), w1(X), r2(X), w2(X), r1(Y), a1

Two operations is a schedule conflict (where the word is to be taken as indicating potential conflict) if:

They belong to different transactions
They access the same data item (eg X)
At least one of the operations is a write()

For example, in Sa, r1(X) and w2(X) conflict. So do r2(X) and w1(X). What does not conflict? r1(X) and r2(X), r1(X) and w1(X).

Conflicting operations are those where interchanging the order can result in a different outcome. A conflict is read-write if it involves a read(X) in one transaction and write(X) in the other, and write-write if it involves two transactions each doing write(X).

A complete schedule is a total order of all the operations, incuding abort/commits. (The book allows some partial ordering of irrelevant parts).

Given a schedule S, the committed projection C(S) is those operations of S that are part of transactions that have committed. This is sometimes useful in analysing schedules when transactions are continuously being added.

We can build a transaction precedence graph based on schedule conflict, given a schedule S. Let us say Ti ⟶ Tj if Ti and Tj conflict as above, and one of the following occurs:

Ti executes write(X) and later Tj executes read(X)
Ti executes read(X) and later Tj executes write(X)
Ti executes write(X) and later Tj executes write(X)

These are the three specific cases of conflict that can occur; note that at this level we distinguish between read-write and write-read.

Demo: build graphs for Fig 21.3(a), (b). Note that the figure determines the schedule.

Recoverable and nonrecoverable schedules

Given two transactions T and T', we say T reads from T', or T depends on T', if T' does a write(X) and then later T does a read(X) (this is Case 1 of T'⟶T above) Thus, the outcome of T may depend on the (at least in part) earlier T'.

A schedule is recoverable if no transaction T can commit until all T' where T reads from T' have already committed. Otherwise a schedule is nonrecoverable. The point is that if T reads from T', then if T' aborts, T must abort too. So T waits to commit until T' commits. Given a recoverable schedule, we never have to abort a committed transaction. This is, in practice, an essential rule.

Consider the following modification of Sa above:
    S'a: r1(X), r2(X), w1(X), r1(Y), w2(X), c2, w1(Y), c1.
This is recoverable even though there is a risk of lost update. But now consider
    Sc: r1(X), w1(X), r2(X), r1(Y), w2(X), c2, a1             nonrecoverable: T2 reads X from T1 but T2 commits first
    Sd: r1(X), w1(X), r2(X), r1(Y), w2(X), w1(Y), c1, c2;    recoverable because T2 waited to commit until T1 did

Suppose T1 aborts in Sd. Then we have Se:

    Se: r1(X), w1(X), r2(X), r1(Y), w2(X), w1(Y), a1, a2;

Though recoverability implies no committed transaction needs to be rolled back, we still may have cascading rollback. A schedule is said to avoid cascading rollback if every transaction T only reads items that were earlier written by committed transactions. In this case, an abort of some other transaction will not cause T to abort. How can we achieve this in Sd/Se? By delaying r2(X) until after c1.

There is also a notion of strict schedule, in which transactions do not use X for either read or write until all transactions that wrote X have committed. This is more expensive to require.

strict implies cascadeless implies recoverable, but not the reverse.

Consider the following three transactions:

    T1: X+=10, Y*=1.1;
    T2: Y+=10, Z *= 1.1;
    T3: Z+=10, X *= 1.1;

What happens if we do these in order T1, T2, T3? We get the same end result in any serialized order.

What happens if we do all the additions and then all the multiplications? Note that, if we defer the commits, this can be a recoverable schedule. And there are no lost updates. But it is still not equivalent to any serial schedule.

Serializability

The ideal situation with transactions is that they are executed serially, but that is too inefficient. Instead, we look for schedules that are in some sense equivalent to a serial schedule. In Fig 21.5, (a) and (b) show two serial schedules. Schedule (c) has a lost update; schedule (d) is ok (in a sense to be defined below) because we can move T2 to the end of T1.

A schedule is serializable if it is equivalent to a serial schedule. For equivalence, we can use result equivalence, but that is intractable. Instead we use conflict equivalence. Two schedules are conflict-equivalent if the order of any two conflicting operations is the same in both schedules. We can reorder the other (nonconflicting) parts to get what we need.

Conflict-serializability algorithm

Build the transaction precedence (directed) graph above. If the graph has no cycles, it is conflict-serializable. Do a topological sort (ie find a total order of the nodes consistent with the Ti⟶Tj ordering); the original schedule is conflict-equivalent to the schedule of executing the transactions in that order.

The actual algorithm is as follows: first, find all Ti that do not have any predecessors, ie Tk with Tk⟶Ti. Do these first, and remove all Ti⟶Tj links. Now repeat, until there are no more nodes. If at some stage there are no Ti without predecessors, then there must be a cycle.

Debit-credit transactions

These are transactions built from pieces of the form read(X), X+=N, write(X). We do not have to serialize these! As long as we avoid the lost-update problem, we can interleave increments and decrements from different transactions without harm, because addition is commutative. The schedule below is not conflict-serializable, but it yields the same result: X = X+N−M, Y=Y+M−N

T1	T2
read(X) X+=N write(X)
	read(Y) Y+=M write(Y)
read(Y) Y−=N write(Y)
	read(X) X−=M write(X)

Locking

We will only consider relatively simple locks; complex locks tend to be inefficient. A binary lock has two states: locked and unlocked. We can implement a per-field/record/block/table lock with a locking table; at any one time, we are likely to have relatively few locks.

When we lock something, we may have to wait. See Fig 22.1. Ideally, we wait in a queue on the lock.

The actual inner details of lock() and unlock() must be atomic. On multi-processor systems, it is not enough simply to disable interrupts; we must actually involve some kernel locking.

A straightforward improvement on binary locks is read-write locks: write locks imply exclusive access, but several transactions can read-lock the same data value concurrently. Note that if we intend to write X eventually, then we will need to write-lock X before reading X; otherwise we may have lost updates.

Two-phase locking

Two-phase locking is a simple protocol in which all locking operations (including upgrades of read-lock(X) to write-lock(X)) precede the first unlock operation. The locking phase is called the expanding phase; the unlock is the shrinking phase.

If all transactions use two-phase locking, then any schedule will be conflict-serializable.

In fig 22.3(a), the two-phase locking protocol is not followed. The schedule in 22.3(c) is allowed by the locks, but it is not serializable.

Fig 22.4 shows the same transactions using the two-phase locking protocol. Here, however, we have the potential for deadlock.

Two-phase locking comes in flavors. The version above is called basic. We also have:

conservative: transactions must lock all the items to be accessed at the start of the transaction. This is deadlock-free, but sometimes the transaction does not know every item to be read until partway through.
strict: no write locks are released until commit/rollback. This means that no other transaction that reads from this one can do so until the first one has committed. This leads to a strict schedule, which is then cascadeless. Deadlocks may occur.
rigorous: no locks at all are released until commit/rollback. Rigorous two-phase locking is sort of a mirror image of conservative two-phase locking.

Deadlock

Locking protocols generally introduce the risk of deadlock:

T1	T2
write_lock(X)	write_lock(Y)
write_lock(Y)	write_lock(X)
...	...

Perhaps the simplest strategy for deadlock prevention is to lock everything according to one master order. This is impractical for the programmer to enforce; the database can only do it if we ask for all the locks together: write_lock(X,Y,Z).

In practical terms, we also have to deal with deadlock detection, and then decide which transaction will die and which will wait. Note that it soon becomes obvious in practice that deadlock has occurred: any transaction that waits for more than a few seconds on a lock, and is not moving up in the queue for that lock, is likely deadlocked. More formally, we can construct a wait-for graph, and look for cycles.

Timestamps

An alternative to locks entirely is the use of timestamps. We assign each transaction a timestamp, and then only allow schedules that are equivalent to executing the transactions in timestamp order. To do this, each database item has associated with it a read-timestamp and a write-timestamp; the former is the latest timestamp of all transactions that have read X, and the latter is the latest timestamp of all transactions that have written X. If a transaction T attempts read(X), we first compare T's timestamp with read-timestamp(X). If T's timestamp is not later than read-timestamp(X), we abort T and put it back in the transaction queue with a later timestamp; ditto for write(X). This protocol can involve cascading rollback, so further constraints are needed to get it to generate cascadeless or strict schedules (or even recoverable schedules, for that matter).

Relax on the CouchDB

Eric Brewer's CAP theorem: Consider the following three attributes for large distributed databases:

Consistency of view across all locations
Availability at all times
Partition tolerance: if a link breaks, the database pieces can keep going

You can only have two of them.

But why do we even care? Why ever build a distributed database?

Because you're tired of being poor, that's why. You want to build an Amazon, an E-bay, a Google, not a single-server DB that can process 1000 queries a second when you have 100,000 arriving.

Brewer's talk is here. Gilbert & Lynch's proof is here.

If we insist on a single master DB, we forfeit partition tolerance.

A classic distributed database forfeits availability.

Consider DNS: we don't have consistency.

One solution is eventual consistency: when connectivity returns, we resolve conflicting transactions. The problem case, of course, is when two transactions involve updates to the same item.

The worst is when amazon has one book left, and two people have just bought it.

Strictly speaking, eventual consistency means that if no new updates occur, then eventually all accesses will return the last-updated value, or the winner of the conflicting-update resolution. DNS does this.

PHP programming

Invoice

Consider the following tables:

create table customer (
    custID    integer primary key,
    cname        varchar(20),
    caddress     varchar(30),
    cphone       varchar(20)
);

create table part (
    partnum       integer primary key,
    descr            varchar(20),
    listprice        real,
    inventory     integer
);

create table invoice (
    invoicenum     integer primary key,
    invdate            date,
    custID             integer,
    foreign key (custID) references customer(custID)
);

create table invItem (
    invoicenum     integer,
    partnum           integer,
    quantity     integer,
    discount    real,
    primary key (invoicenum, partnum),
    foreign key (invoicenum) references invoice(invoicenum),
    foreign key (partnum) references part(partnum)
);

The customer and part tables are standard. The invoice table consists of invoice numbers (the key), together with date and customer. The contents of an invoice consists of individual parts, each with a given quantity. The actual items on the invoice are listed in table invItem, with the two-attribute key (invoicenum, partnum). We also allow a discount from list price (eg 15%), on a per-line-of-invoice basis.

What if we want to add up all the parts in
    why we might initially want to avoid SQL
    how to do the math in PHP
    why we don't really want to take the PHP approach

Reservations problem

Suppose we want to create a reservation system for trains (the traditional mode of transportation for the Seat Reservation Problem, which has an extensive literature). Here is some information:

The train stops in Argenta, Brocton, Colp, Dongola, Elsah (4 links).
(Illinois towns, almost certainly NOT on any kind of line)

    seat 1 is reserved A-B
    seat 2 is reserved B-C
    seat 3 is reserved C-D
    seat 4 is reserved D-E

Problem: no seats in 1-4 are now available for end-to-end!

Some algorithms:
OPT: get all reservation requests; find the arrangement that seats the longest initial segment.

Worst:
Assign the first seat that has no reservations attached. Above.

First-Fit:
Assign the min seatnumber that is free for the desired interval.

Best-Fit:
Minimize the size of the empty space containing the seat. We need lots of tie-breaking rules to make this precise.
With five seats, do we prefer _ _ x x _ or _ x x _ _? (X = reservation, _ = vacant empty space)

Worst-Fit:
maximize the empty space left open around a seat, on the theory that we're leaving large blocks for later.

The problem case above is due to worst-fit.

Other algorithms used in practice:

Long-haul-only:

Pooling:
have separate seat pools for short-haul and long-haul reservations. In general, K pools where K <= #legs

Usual performance measure: worst-case ratio of # of passengers accommodated using the algorithm to # of passengers that could be accommodated using OPT.

These algorithms are related to the memory-allocation problem, except that we assume
    NO CANCELLATIONS (or few, at any rate)
    RESERVATION MUST BE CONTIGUOUS

Suppose there are only two legs on the schedule.

We can prove that first-fit is optimal: Suppose some reservation is blocked. Claim: it's because on one leg or the other, the train is full.

Take away the 2-leg (through) reservations. Remaining one-leg reservations for seat 1 form an initial segment; similar for seat 2. Whichever had more reservations must have used ALL the remaining ones.

Three legs, TWO SEATS

p1: leg 1
p2: leg 1,2
p3: leg 3
p4: leg 2,3

First-fit assigns:
p1: seat 1, leg 1
p2: seat 2, leg 1,2
p3: seat 1, leg 3
p4: no available seats

but best-fit assigns as:

p1: seat 1, leg 1
p2: seat 2, leg 1,2
p3: seat 2, leg 3 tighter fit than seat 1
p4: seat 1, leg 2,3

What if we just allow reservations for trips of length >= 2? This also works.

Representing reservations:

Attempt 1:

<passenger, seat, leg, date>

create table reservations (
    passno integer primary key, -- one passenger can't reserve two seats
    seat integer,
    leg integer,
    unique (seat, leg)        -- can't reserve same seat twice
);

I left out DATE for simplicity.

Problem: queries for available seats are very difficult in pure SQL. You can only select fields from a SUBSET of records. You can't ask about vacancies (but see below)

To ask for seats reserved on leg 3:

    select r.seat from reservations r where r.leg = 3;

To ask for seats reserved on legs 2 or 3:

    select r.seat from reservations r where r.leg between 2 and 3;

But how to ask for free seats?



Attempt 2:

<passenger, seat, leg, date> plus VACANCIES: <seat, leg, date>, initially full.

To ask for an open seat on leg 3: (again ignore date; just assume <SEAT, LEG>

    select v.seat from VACANCIES v where v.leg = 3;

        (select * from VACANCIES v2 where v.seat = v2.seat and v2.leg = 3)

Attempt 1: table Reservations

Hard to find vacant seats on leg 3

Attempt 2: separate table of Vacancies
Easy to find vacant seats on leg 3, BUT tricky to guarantee consistency

Back to attempt 1:
create table reservations (
    passno integer,
    seat integer,
    leg integer,
    primary key (seat, leg)        -- can't reserve same seat twice
);

create table seats (
    seat integer primary key;
);

create table legs (leg integer primary key);

Now, to find seats that are vacant on legs 2 and 3:

    select seat from seats
    minus
    select r.seat from reservations r where r.leg between 2 and 3;

To implement the First-Fit algorithm:

    select min(seat) from
    (     select seat from seats
        minus
        select r.seat from reservations r where r.leg between 2 and 3
    );

First-fit:

To reserve a seat for legs A-B:
    select min(seat) from
    (     select seat from seats
        minus
        select r.seat from reservations r where r.leg between A and B
    );

Best-fit: I couldn't do this in SQL
    for each seat, get length of open block containing A-B (zero for some)
    almost a sql query
    select n from seats where

    now, order this by size, and consider only blocks of min size.

    AVAIL(s, lo, hi): seat s is available on legs lo through hi:

        NOT EXISTS (select * from reservation r where r.seat = s
            and r.leg between lo and hi)

for each seat, the biggest block of free legs below 3:
    select s. seat, min(l.leg) from SEATS s, LEGS l where
        l.leg <= 3
    and    not exists (select * from reservations r where r.seat = s.seat
                and r.leg between l.leg and 3)
    group by s.seat;

    select s. seat, min(lo.leg), max(hi.leg) from SEATS s, LEGS lo, LEGS hi
    where
        lo.leg <= 3 and hi.leg >= 3
    and    not exists (select * from reservations r where r.seat = s.seat
                and r.leg between lo.leg and hi.leg)
    group by s.seat;

select s. seat, min(lo.leg), max(hi.leg), max(hi.leg)-min(lo.leg) AS gap
from SEATS s, LEGS lo, LEGS hi
where
    lo.leg <= 3 and hi.leg >= 3
and    not exists (select * from reservations r where r.seat = s.seat
            and r.leg between lo.leg and hi.leg)
group by s.seat
order by gap;

Range query:

select s. seat, min(lo.leg), max(hi.leg), max(hi.leg)-min(lo.leg) AS gap
from SEATS s, LEGS lo, LEGS hi
where
    lo.leg <= 3 and hi.leg >= 4
and    not exists (select * from reservations r where r.seat = s.seat
            and r.leg between lo.leg and hi.leg)
group by s.seat
order by gap;

Best fit:
    1. put the above into an array, stopping when the value of GAP becomes > value of GAP in first row.
    3. Apply tie-breaker rules to the entries in the array.