Database week 13

Comp 353/453: Database Programming, Corboy 523, 7:00 Thursdays

Week 13, Apr 27

Read in Elmasri & Navathe (EN)

Chapter 17, File Structures
Chapter 18, Indexes
Chapter 19, Query Optimization
Chapter 21, Concurrency

Final: open book?

Query processing

First, note it is relatively easy to sort records. Typically we will use merge-sort, where we subdivide into blocks small enough to fit into memory. We sort those with, say, quicksort. We then make passes until each pair of consecutive blocks is merged.

Select

Some examples

select * from employee e where e.ssn = '123456789'
select * from department where dnumber >=5
select * from empoyee where dno = 5
select * from employee where dno = 5 and salary > 30000 and sex = 'F'
select * from works_on where essn='123456789' and pno=10

Methods of implementation:

Linear search: we read each disk block once.
Binary search: this is an option if the selection condition is an equality test on a key attribute, and the file is ordered by that attribute.
Primary index: same requirement as above, plus we need an index
Hash search: equality comparison on any attribute for which we have a hash index (need not be a key attribute)
Primary index and ordered comparison: use the index to find all departments with dnumber > 5
Clustered index (includes hash indexes)
B-tree index

If we are searching for "dno=5 AND salary=4000" we can use an index for either attribute, or a composite index if one exists.

In general, we'd like to choose the condition with the fewest results, and then continue with the remaining conditions.

For OR conditions, we just have to search for each.

Week 13: Joins

Mostly we focus on equijoins:

select * from employee e, department d where e.dno = d.dnumber
select * from department d, employee e where d.mgr_ssn = e.ssn

Methods of implementation:

Book example, EN6 p 690-693: Suppose we have 6,000 employee records in 2,000 blocks, and 50 departments in 10 blocks. We have indexes on both tables. The employee.ssn index takes 4 accesses, and the dept.dnumber index takes 2.

1. Nested-loop join: this is where we think of a for-loop:
    for (e in employee) {
       for (d in department) {
          if (e.dno = d.dnumber) print(e,d);
       }
    }

This is quadratic. If we go through the employee table record by record, that amounts to 2,000 block accesses. For each employee record the loop above would go through all 10 blocks of departments; that's 6,000 × 10 = 60,000 blocks. Doing it the other way, we go through 10 blocks of departments, and, for each department record, we search 2,000 blocks of employees for 50×2,000 = 100,000 block accesses.

However, note that we can also do this join block-by-block on both files:

read a block of employees
read a block of departments
search both for any join instances
go on to the next block of departments

Done this way, the number of block accesses is 2,000 × 10 = 20,000 blocks.

Performance improves rapidly if we can keep the smaller table entirely in memory: we then need only 2,010 block accesses! (The EN6 analysis makes some additional assumptions about having spare buffer blocks to work with.)

2. Index join: if we have an index on one of the attributes, we can use it:
    for (e in employee) {
       d = lookup(department, e.ssn);
       if (d != null) print(e,d);
    }

Note that any index will do, but that this may involve retrieving several disk blocks for each e and will almost certainly involve retrieving at least one disk block (from department) for every e in employee. It may or may not be better than Method 1. Consider the first query. Suppose we have a primary index on department.dno that allows us to retrieve a given department in 2 accesses. Then we go through 6,000 employees and retrieve the department of each; that's 6,000×2 = 12,000 block accesses. Now consider the second query, and suppose we can find a given employee in 4 accesses. Then we go through 50 × 4 = 200 block accesses (for every department d, we look up d.mgr_ssn in table employee).

3. Sort-merge join: we sort both files on the attribute in question, and then do a join-merge. This takes a single linear pass, of size the number of blocks in the two files put together file. This is most efficient if the files are already sorted, but note that it's still faster than 1 (and possibly faster than 2) if we have to sort the files. Assume both tables are sorted by their primary key, and assume we can sort in with N log(N) block accesses, where N is the number of blocks in the file. Then query 1 requires us to sort table employee in time 2,000×11 = 22,000; the actual merge time is much smaller. Query 2 requires us to sort table department in time 10×4 = 40; the merge then takes ~2,000 blocks for the employee table.

4. Partition-hash join: Let the relations (record sets) be R and S. We partition both files into Ri = {r in R | hash(r) = i}. Now we note that the join R ⋈ S is simply the disjoint union of the Ri ⋈ Si. In most cases, either Ri or Si will be small enough to fit in memory.

A good hash function helps; we want the buckets to be relatively uniform. We won't get that, but we should hope for Poisson distribution. With N things in K buckets, we expect an average of a=N/K per bucket. The probability of i things is aⁱe^-a/i! and the expected number of i-sized buckets is about N*aⁱe^-a/i!.

The join selection factor is the fraction of records that will participate in the join. In query 2 above,
    select * from department d, employee e where d.mgr_ssn = e.ssn
all departments will be involved in the join, but almost no employees. So we'd rather go through the departments, looking up managing employees, rather than the other way around.

In the most common case of joins, the join field is a foreign key in one file and a primary key in the other. Suppose we keep all files sorted by their primary key. Then for any join of this type, we can traverse the primary-key file block by block; for each primary-key value we need to do a lookup of the FK attribute in the other file. This would be method 2 above; note that we're making no use of the primary index.

Query optimization

As with compilers, the word "optimization" is used loosely.

A major rule is to apply select/project before joins, and in general do operations first that have greatest potential for reducing the size of the number of records.

Query tree: See fig 19.4.

Select lname from employee e, works_on w, project p
where e.ssn = w.essn and w.pno = p.pnumber and p.pname = 'Aquarius';

See fig 19.5, but note that doing cross-products was never on the agenda. However, it still does help to do the second join first.

Some heuristic rules:

Move the more-restrictive SELECT operations down towards the leaf nodes; that will reduce number of records.
Same with PROJECT; try to do joins last
Do the joins in the right order, using the join selection factor estimate
Do the smaller joins first (or, more generally, do the joins that will result in an absolutely smaller set of records, whether this is because the relations were small to start with or because one condition was highly selective).

Transactions

A transaction is a set of operations, which we can idealize as read(X), write(X), which at the end we either commit (save) or rollback/abort (discard). Generally we hope rollbacks are rare.

Abstract transactions: operations are read(X) and write(X), for X a field/record/block/table.
Fig 21.2: two example transactions.
T1: read(X), write(X), read(Y), write(Y)
T2: read(X), write(X)

Ideally, we execute the transactions one at a time, with no overlap. In practice, because of the considerable I/O-wait for disk retrieval, we cannot afford that. Transactions will overlap. We need to figure out how to arrange for this to happen safely. Here are two problems:

Lost-update (write/write) problem: this occurs when one transaction writes a value that another then overwrites. See Fig 21.3a. T1 is transferring N dollars from X to Y, while T2 is adding M dollars to X. T2.write(X) overwrites T1.write(X).

T1	T2
read(X) X = X-N
	read(X) X = X+M
write(X) read(Y)
	write(X)
Y = Y+N write(Y)

Dirty-Read (read/write) problem: this occurs when one transaction reads an item that another then overwrites. This would end up as a lost-update, but consider Fig 21.3b where T1 actually aborts after write(X).

T1	T2
read(X) X = X-N write(X)
	read(X) X = X+M write(X)
read(Y) abort/rollback

Fig 21.3c shows a corrupted result; an abort would have been an improvement!

The system log, or journal, is a master list of all operations performed, used in the event of rollback. Failures leading to rollback may be system crash, lack of disk space, lookup error (eg SQL error), locking problems, or other physical or logical errors.

Items read here can be individual record attributes, entire records, or entire tables. When joining tables, the entire table is read.

The ACID test

This is commonly described as the following set of features.

Atomicity: either all steps of a transaction are performed, or none are. We either commit the entire thing or rollback completely.
Consistency preservation: the individual transaction shoud preserve database consistency and database invariants
Isolation: transactions should appear to be executing in isolation.
Durability: committed transactions survive DB crashes

Note that Durability is arguably implied by Atomicity, or at least the two together are implied by some greater "recovery" rule. There are other redundancies here as well; the main points for transactions are atomicity and isolation, with the latter understood as a special case of correctness.

For example, note that Consistency is closely related to Isolation: if each transaction preserves consistency, and the transactions appear to execute in Isolation, then a series of transactions will be Consistency preserving. Consistency would be implied by Serializability: that the transactions appearedto have been executed in some serial order. However, this might be a stronger condition, if we really had a good definition for isolation. (Which we in fact do not; think about it. Query 1 may affect the outcome of Query 2 even if they are executed one after the other.)

Moral: beware of choosing your terms to fit a clever acronym.

Transaction schedules

Transaction schedules are lists of read/write operations of each transaction, plus commit/abort. The operations of each individual transaction must be in their original order.

Eventually we will enforce schedules by using locking, but for now we just consider schedules abstractly.

Examples:

Sa: Fig 21.3a: r1(X), r2(X), w1(X), r1(Y), w2(X), w1(Y), c1, c2
Sb: Fig 21.3b: r1(X), w1(X), r2(X), w2(X), r1(Y), a1

Two operations is a schedule conflict (where the word is to be taken as indicating potential conflict) if:

They belong to different transactions
They access the same data item (eg X)
At least one of the operations is a write()

For example, in Sa, r1(X) and w2(X) conflict. So do r2(X) and w1(X). What does not conflict? r1(X) and r2(X), r1(X) and w1(X).

Conflicting operations are those where interchanging the order can result in a different outcome. A conflict is read-write if it involves a read(X) in one transaction and write(X) in the other, and write-write if it involves two transactions each doing write(X).

A complete schedule is a total order of all the operations, incuding abort/commits. (The book allows some partial ordering of irrelevant parts).

Given a schedule S, the committed projection C(S) is those operations of S that are part of transactions that have committed. This is sometimes useful in analysing schedules when transactions are continuously being added.

We can build a transaction precedence graph based on schedule conflict, given a schedule S. Let us say Ti ⟶ Tj if Ti and Tj conflict as above, and one of the following occurs:

Ti executes write(X) and later Tj executes read(X)
Ti executes read(X) and later Tj executes write(X)
Ti executes write(X) and later Tj executes write(X)

These are the three specific cases of conflict that can occur; note that at this level we distinguish between read-write and write-read.

Demo: build graphs for Fig 21.3(a), (b). Note that the figure determines the schedule.

Recoverable and nonrecoverable schedules

Given two transactions T and T', we say T reads from T', or T depends on T', if T' does a write(X) and then later T does a read(X) (this is Case 1 of T'⟶T above) Thus, the outcome of T may depend on the (at least in part) earlier T'.

A schedule is recoverable if no transaction T can commit until all T' where T reads from T' have already committed. Otherwise a schedule is nonrecoverable. The point is that if T reads from T', then if T' aborts, T must abort too. So T waits to commit until T' commits. Given a recoverable schedule, we never have to abort a committed transaction. This is, in practice, an essential rule.

Consider the following modification of Sa above:
    S'a: r1(X), r2(X), w1(X), r1(Y), w2(X), c2, w1(Y), c1.
This is recoverable even though there is a risk of lost update. But now consider
    Sc: r1(X), w1(X), r2(X), r1(Y), w2(X), c2, a1             nonrecoverable: T2 reads X from T1 but T2 commits first
    Sd: r1(X), w1(X), r2(X), r1(Y), w2(X), w1(Y), c1, c2;    recoverable because T2 waited to commit until T1 did

Suppose T1 aborts in Sd. Then we have Se:

    Se: r1(X), w1(X), r2(X), r1(Y), w2(X), w1(Y), a1, a2;

Though recoverability implies no committed transaction needs to be rolled back, we still may have cascading rollback. A schedule is said to avoid cascading rollback if every transaction T only reads items that were earlier written by committed transactions. In this case, an abort of some other transaction will not cause T to abort. How can we achieve this in Sd/Se? By delaying r2(X) until after c1.

There is also a notion of strict schedule, in which transactions do not use X for either read or write until all transactions that wrote X have committed. This is more expensive to require.

strict implies cascadeless implies recoverable, but not the reverse.

Consider the following three transactions:

    T1: X+=10, Y*=1.1;
    T2: Y+=10, Z *= 1.1;
    T3: Z+=10, X *= 1.1;

What happens if we do these in order T1, T2, T3? We get the same end result in any serialized order.

What happens if we do all the additions and then all the multiplications? Note that, if we defer the commits, this can be a recoverable schedule. And there are no lost updates. But it is still not equivalent to any serial schedule.

Serializability

The ideal situation with transactions is that they are executed serially, but that is too inefficient. Instead, we look for schedules that are in some sense equivalent to a serial schedule. In Fig 21.5, (a) and (b) show two serial schedules. Schedule (c) has a lost update (T1's write(X) is clobbered by T2's later write(X)); schedule (d) is ok (in a sense to be defined below) because we can move T2 to the end of T1.

A schedule is serializable if it is equivalent to a serial schedule. For equivalence, we can use result equivalence, but that is intractable. Instead we use conflict equivalence. Two schedules are conflict-equivalent if the order of any two conflicting operations is the same in both schedules. We can reorder the other (nonconflicting) parts to get what we need.

Conflict-serializability algorithm

Build the transaction precedence (directed) graph above. If the graph has no cycles, it is conflict-serializable. Do a topological sort (ie find a total order of the nodes consistent with the Ti⟶Tj ordering); the original schedule is conflict-equivalent to the schedule of executing the transactions in that order.

The actual algorithm is as follows: first, find all Ti that do not have any predecessors, ie Tk with Tk⟶Ti. Do these first, and remove all Ti⟶Tj links. Now repeat, until there are no more nodes. If at some stage there are no Ti without predecessors, then there must be a cycle.

Fig 21.7 shows the graphs for the schedules in Fig 21.5.

Debit-credit transactions

These are transactions built from pieces of the form read(X), X+=N, write(X). We do not have to serialize these! As long as we avoid the lost-update problem, we can interleave increments and decrements from different transactions without harm, because addition is commutative. The schedule below is not conflict-serializable, but it yields the same result, namely X = X+N−M, Y=Y+M−N.

T1	T2
read(X) X+=N write(X)
	read(Y) Y+=M write(Y)
read(Y) Y−=N write(Y)
	read(X) X−=M write(X)

Locking

We will only consider relatively simple locks; complex locks tend to be inefficient. A binary lock has two states: locked and unlocked. We can implement a per-field/record/block/table lock with a locking table; at any one time, we are likely to have relatively few locks.

When we lock something, we may have to wait. See Fig 22.1. Ideally, we wait in a queue on the lock.

The actual inner details of lock() and unlock() must be atomic. On multi-processor systems, it is not enough simply to disable interrupts; we must actually involve some kernel locking.

A straightforward improvement on binary locks is read-write locks: write locks imply exclusive access, but several transactions can read-lock the same data value concurrently. Note that if we intend to write X eventually, then we will need to write-lock X before reading X; otherwise we may have lost updates.

Two-phase locking

Two-phase locking is a simple protocol in which all locking operations (including upgrades of read-lock(X) to write-lock(X)) precede the first unlock operation. The locking phase is called the expanding phase; the unlock is the shrinking phase.

If all transactions use two-phase locking, then any schedule will be conflict-serializable.

In fig 22.3(a), the two-phase locking protocol is not followed. The schedule in 22.3(c) is allowed by the locks, but it is not serializable.

Fig 22.4 shows the same transactions using the two-phase locking protocol. Here, however, we have the potential for deadlock.

Two-phase locking comes in flavors. The version above is called basic. We also have:

conservative: transactions must lock all the items to be accessed at the start of the transaction. This is deadlock-free, but sometimes the transaction does not know every item to be read until partway through.
strict: no write locks are released until commit/rollback. This means that no other transaction that reads from this one can do so until the first one has committed. This leads to a strict schedule, which is then cascadeless. Deadlocks may occur.
rigorous: no locks at all are released until commit/rollback. Rigorous two-phase locking is sort of a mirror image of conservative two-phase locking.

Deadlock

Locking protocols generally introduce the risk of deadlock:

T1	T2
write_lock(X)	write_lock(Y)
write_lock(Y)	write_lock(X)
...	...

Perhaps the simplest strategy for deadlock prevention is to lock everything according to one master order. This is impractical for the programmer to enforce; the database can only do it if we ask for all the locks together: write_lock(X,Y,Z).

In practical terms, we also have to deal with deadlock detection, and then decide which transaction will die and which will wait. Note that it soon becomes obvious in practice that deadlock has occurred: any transaction that waits for more than a few seconds on a lock, and is not moving up in the queue for that lock, is likely deadlocked. More formally, we can construct a wait-for graph, and look for cycles.

Timestamps

An alternative to locks entirely is the use of timestamps. We assign each transaction a timestamp, and then only allow schedules that are equivalent to executing the transactions in timestamp order. To do this, each database item has associated with it a read-timestamp and a write-timestamp; the former is the latest timestamp of all transactions that have read X, and the latter is the latest timestamp of all transactions that have written X. If a transaction T attempts read(X), we first compare T's timestamp with read-timestamp(X). If T's timestamp is not later than read-timestamp(X), we abort T and put it back in the transaction queue with a later timestamp; ditto for write(X). This protocol can involve cascading rollback, so further constraints are needed to get it to generate cascadeless or strict schedules (or even recoverable schedules, for that matter).

Transactions in PHP

See http://php.net/manual/en/pdo.transactions.php. Here's an example from that page:

try {  
  $db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

  $db->beginTransaction();
  $db->exec("insert into staff (id, first, last) values (23, 'Joe', 'Bloggs')");
  $db->exec("insert into salary (id, amount) values (23, 50000)");
  $db->commit();
  
} catch (Exception $e) {
  $db->rollBack();
  echo "Failed: " . $e->getMessage();
}

Bear in mind that you should not interpose any wait for user input between the beginTransaction() and the commit(); otherwise you may be holding a lock "indefinitely".

Crash Recovery

A crucial feature of transactions (the "atomic" property) is that if the database crashes, or is restarted (eg due to deadlocks), or one transaction fails, then no transaction is left "partially" executed: every transaction either completes or is rolled back.

The typical way to implement this is with a "transaction journal file", which is carefully written to disk after every update. This journal might contain:

a transaction ID
original values of the data fields to be updated
a flag indicating that the transaction is pending

The flag is set to "committed" at the end. Then, if the DB is restarted after a crash, we can restore the records involved to their pre-transaction state.

Note that we also will now need to roll back any other transaction that depended on this one; we can use the transaction ID for that.

NoSQL

There is currently a movement that suggests that the days of relational databases are waning, and that emerging needs to manage very large databases means that new approaches must be found. A typical NoSQL database supports tables that can be looked up via one primary key column, but not necessarily joined to other tables, or searched by other columns. Such a structure looks something like a giant version of a Java HashMap, though often with multiple data columns.

Not everyone is in agreement with this: see I Can't Wait for NoSQL to Die, by Ted Dziuba. Also note that, more and more, NoSQListas claim that their acronym stands for "Not only SQL", as if to recognize that SQL and Relationalism will always have a role to play.

But for the Facebooks and Amazons and Googles out there, a traditional RDBMS has limitations. The issue, however, isn't simply a matter of size: a big part of it is the need to partition the database to support huge numbers of users.

Also, there are several different ways to think about size:

Raw number of records
Number of records being written; large distributed read-only databases are relatively easy
Number of simultaneous locks, related to the number of simultaneous writes. Large distributed email databases, where everyone is constantly having new email written to their inbox, are relatively easy too: any inbox is accessed by only two entities: the mail deliverer (adding mail) and the user (deleting it).

As another example, consider the user "walls" on Facebook. These are viewed from numerous points, and updated from numerous sources. However, the updates are timestamped, and we can rely on the updates arriving eventually. Thus, if we replicate a given user's wall over several databases, there may be moments when different versions of the wall show different comments. However, we are ensured of eventual consistency: a given comment will eventually appear in every wall instance, in the correct order.

As a third example, consider Amazon's online-ordering database. A single user placing an order can be assigned to one particular, smaller database. The items are entered, the payment is made, and then Amazon needs to process the order. But the latter can be done in a relatively leisurely fashion. The important part, however, is making sure the inventory is updated; in particular, we don't want to sell the last copy of something to two different people.

At first glance this suggests that the inventory database needs to be consistent at all times. But there's a problem: the CAP theorem.

CAP theorem

Eric Brewer's CAP theorem: Consider the following three attributes for large distributed databases:

Consistency of view across all locations
Availability at all times
Partition tolerance: if a link breaks, the database pieces can keep going

Theorem: you can only have two of them.

Argument: Suppose you have the last two. A link does break, and the two halves continue in isolation. Then we cannot possibly hope for consistency.

But why do we even care? Why ever build a distributed database?

Because you're tired of being poor, that's why. You want to build an Amazon, an E-bay, a Google, not a single-server DB that can process 1000 queries a second when you have 100,000 arriving.

Brewer's talk is here. Gilbert & Lynch's proof is here.

If we insist on a single master DB, we forfeit partition tolerance. Brewer's theorem does not mention this explicitly, but for databases to be really large we want to be able to distribute them.

A classic distributed database forfeits availability: if the link between nodes goes down, the database is unavailable to anyone, even those able to access one of the nodes.

Consider DNS as a big database: we do not have consistency.

One solution is eventual consistency: when connectivity returns, we resolve conflicting transactions. DNS does have eventual consistency. The problem case, of course, is when two transactions involve updates to the same item. Note that for the Facebook wall example above, there aren't really any conflicting transactions (unless you count this: A posts, B comments on A's post, A deletes his/her post without seeing B's. But this happens all the time even when the data is consistent: A posts, B comments, A deletes without refreshing the page).

The worst situation is when Amazon has one book left, and two people have just bought it. We can guarantee this won't happen with a central database (or even a separate central database for each of several disjoint sets of products, eg DB[N] is for all items where SKU mod 137 = N). But eventual consistency is not good here: it leads to scenarios where the last copy is sold twice.

Note, however, that while eventual consistency may not be good here, it may in fact be good enough. What should Amazon do if the last copy is sold twice? They can just tell one of the buyers that, too bad, the item they ordered turned out to be out of stock. This happens often enough anyway, due to discrepancies between the online inventory and the physical inventory.

In other words, eventual consistency may not be ideal, but it may be a workable arrangement.

Formally, eventual consistency means that if no new updates occur, then eventually all accesses will return the last-updated value, or the winner of the conflicting-update resolution. DNS does this.

Denormalization

Aside from the CAP theorem, there are other problems with RDBMSs. Joins, for one. Joining two big tables can be expensive. The advantage of joins is traditionally that the underlying database is protected from anomalies. But suppose we eliminate those through careful front-end programming? (As an example, suppose we move the constraint that SSN is a key of the EMPLOYEE table out of MySQL and into our PHP front end. It would be straightforward to implement.)

For as long as RDBMSs have been around, DB managers have engaged in occasional "denormalization" to avoid a particular join for performance reasons. One example might be a TOTAL field in the INVOICE table: the total for a specific invoice_num is calculated by finding all items ordered, as listed in INVITEMS:
select sum(quantity*price) from INVITEM ii where ii.invnum = invoice_num;
This is a relatively expensive join and search! Storing a TOTAL field in the INVOICE table introduces a potential inconsistency, but it is also rather common practice for performance reasons.

EN6 gives a denormalization example on pp 731-732. Consider a view ASSIGN:
⟨emp_id, project_id, emp_lname, percent_assigned, proj_name, proj_mgr_id, proj_mgr_name⟩

This is a complex join, even after replacing percentages with project_hours and total_hours:

select e.ssn, p.pnumber, e.lname, w.hours, sum(w2.hours), p.pname, d.mgr_ssn, m.lname
from employee e, employee m, project p, works_on w, works_on w2, department d
where e.ssn = w.essn and p.pnumber = w.pno and e.ssn = w2.essn and p.dnum = d.dnumber and d.mgr_ssn = m.ssn
group by e.ssn;

Sometimes materialized views are used for this. Note that this amounts to a denormalized table, though it is now one that the DBMS is responsible for keeping synchronized.

select e.ssn, p.pnumber, e.lname, w.hours, (select sum(w2.hours) from works_on w2 where w2.essn = e.ssn) as total, p.pname, d.mgr_ssn, m.lname
from employee e, employee m, project p, works_on w, department d
where e.ssn = w.essn and p.pnumber = w.pno and p.dnum = d.dnumber and d.mgr_ssn = m.ssn
group by e.ssn;

Big Data Models

Often these models are not only NoSQL but are "schema free", usually meaning that they store ⟨key,value⟩ pairs where the value can be some structured data, sometimes in JavaScript Object Notation, below (note the hierarchical subtyping and the support for list structures).

{ "ssn"     : "123456789",
"name"    : { "fname" : "John", "lname" : "Smith" },
"bdate"   : "1965-01-09",
"address" : {"street": "731 Fondren", "city" : "Houston", "state" : "TX" },
"salary" : 30000,
"dno"     : 5,
"projects": [ {"pno" : "3", "hours" : 30 }, {"pno" : "20", "hours": 10 } ]
}

The main point is that for a given key the rest of the data can be any type; that is, we in effect have a map of key strings to Object.

Examples:

BigTable: a Google internal product. Tables can be subdivided (and then distributed).

Hadoop: a public Apache project. The framework is typically used to implement distributed filesystems, or other large projects with a high degree of distributedness.

Cassandra: also an Apache project. Cassandra is specifically a database intended to handle very large amounts of data spread over a large number of servers. The data model was taken from BigTable. A given key can map to multiple columns, often grouped into column families. Column families are in effect declared through a database schema, but the actual columns within families are not. Until 2010, the Facebook messaging system used Cassandra; it then moved to HBase.

Hbase: another Apache database with close ties to the Hadoop framework.

MongoDB: A "document-centric" database using JSON-like syntax (above). But field names do exist (even if they are now dynamic). Searches can be done on any field, and indexes can be created. Data distribution is again a major goal; Mongo also includes load-balancing features.

CouchDB: Couch ("Cluster Of Unreliable Commodity Hardware") was an earlier NoSQL effort. Data consists of free-form "documents", with JSON structuring. Keys are internal row identifiers. Couch has a form of support for ACID, but substituting eventual consistency for strict consistency. Couch has specific tools to handle resynchronization of separated databases (eg an mp3 player that spends significant amounts of time "offline" but that is merged at intervals with the master database). Documents in the database contain revision information, intended to facilitate the merger of two documents from two separated Couch instances. It is left up to the application to figure out how to merge two documents that have been independently modified; sometimes this is straightforward and other times it is impossible to do in a meaningful way.

Mongo and Couch are considered to be "document" databases; Cassandra and Hbase are considered to be "wide-column" databases.

Reservations problem

Not on the final!

Suppose we want to create a reservation system for trains (the traditional mode of transportation for the Seat Reservation Problem, which has an extensive literature). Here is some information:

The train stops in Argenta, Brocton, Colp, Dongola, Elsah (4 links). (These are Illinois towns, almost certainly NOT on any kind of line.) Suppose the following reservations are made:

    seat 1 is reserved A-B
    seat 2 is reserved B-C
    seat 3 is reserved C-D
    seat 4 is reserved D-E

Problem: no seats in 1-4 are now available for end-to-end!

Note that there is always a solution that involves no seat-changing, as long as no link is ever overbooked: have passengers get on the train and find an empty seat. If there is no seat, that link is overbooked. If they find a seat, they keep it until they get off. The problem with this is that we would like to grant reservations in the order made, rather than in the order of embarkation.

Some algorithms:
OPT (optimum): Essentially what was outlined above. We get all reservation requests and find the arrangement that seats everyone. (A variation may involve reservations made by groups, the "find a seat" method may split up groups, which we might not want to allow.)

Worst:
Assign the first seat that has no reservations attached. Above.

First-Fit:
Assign the min seatnumber that is free for the desired interval.

Best-Fit:
Minimize the size of the empty space containing the seat. We need lots of tie-breaking rules to make this precise.
With five seats, do we prefer _ _ x x _ or _ x x _ _? (X = reservation, _ = vacant empty space)

Worst-Fit:
maximize the empty space left open around a seat, on the theory that we're leaving large blocks for later.

The problem case above is due to worst-fit.

Other algorithms used in practice:

Long-haul-only: Accept reservations for travelers who are traveling > 50% of the route, say, first. A few days before travel, open up the reservation system to other travelers.

Pooling:
have separate seat pools for short-haul and long-haul reservations. In general, K pools where K <= #legs

Usual performance measure: worst-case ratio of # of passengers accommodated using the algorithm to # of passengers that could be accommodated using OPT.

These algorithms are related to the memory-allocation problem, except that we assume
    NO CANCELLATIONS (or few, at any rate)
    RESERVATION MUST BE CONTIGUOUS

Suppose there are only two legs on the schedule.

We can prove that first-fit is optimal: Suppose some reservation is blocked. Claim: it's because on one leg or the other, the train is full.

Take away the 2-leg (through) reservations. Remaining one-leg reservations for seat 1 form an initial segment; similar for seat 2. Whichever had more reservations must have used ALL the remaining ones.

Three legs, TWO SEATS

p1: leg 1
p2: leg 1,2
p3: leg 3
p4: leg 2,3

First-fit assigns:
p1: seat 1, leg 1
p2: seat 2, leg 1,2
p3: seat 1, leg 3
p4: no available seats

but best-fit assigns as:

p1: seat 1, leg 1
p2: seat 2, leg 1,2
p3: seat 2, leg 3 tighter fit than seat 1
p4: seat 1, leg 2,3

What if we just allow reservations for trips of length >= 2? This also works.

Representing reservations:

Attempt 1:

⟨passenger, seat, leg, date⟩

create table reservations (
    passno integer primary key, -- one passenger can't reserve two seats
    seat integer,
    leg integer,
    unique (seat, leg)        -- can't reserve same seat twice
);

I left out DATE for simplicity.

Problem: queries for available seats are very difficult in pure SQL. You can only select fields from a SUBSET of records. With only the table above you cannot ask about vacancies (but see below)

To ask for seats reserved on leg 3:

    select r.seat from reservations r where r.leg = 3;

To ask for seats reserved on legs 2 or 3:

    select r.seat from reservations r where r.leg between 2 and 3;

But how to ask for free seats?



Attempt 2:

⟨passenger, seat, leg, date⟩ plus VACANCIES: ⟨seat, leg, date⟩, initially full.

To ask for an open seat on leg 3: (again ignore date; just assume ⟨SEAT, LEG⟩

    select v.seat from VACANCIES v where v.leg = 3;

        (select * from VACANCIES v2 where v.seat = v2.seat and v2.leg = 3)

Attempt 1: table Reservations

Hard to find vacant seats on leg 3

Attempt 2: separate table of Vacancies
Easy to find vacant seats on leg 3, BUT tricky to guarantee consistency

Back to attempt 1:
create table reservations (
    passno integer,
    seat integer,
    leg integer,
    primary key (seat, leg)        -- can't reserve same seat twice
);

create table seats (
    seat integer primary key;
);

create table legs (leg integer primary key);

Now, to find seats that are vacant on legs 2 and 3:

    select seat from seats
    minus
    select r.seat from reservations r where r.leg between 2 and 3;

To implement the First-Fit algorithm:

    select min(seat) from
    (     select seat from seats
        minus
        select r.seat from reservations r where r.leg between 2 and 3
    );

First-fit:

To reserve a seat for legs A-B:
    select min(seat) from
    (     select seat from seats
        minus
        select r.seat from reservations r where r.leg between A and B
    );

Best-fit: I couldn't do this in SQL
    for each seat, get length of open block containing A-B (zero for some)
    almost a sql query
    select n from seats where

    now, order this by size, and consider only blocks of min size.

    AVAIL(s, lo, hi): seat s is available on legs lo through hi:

        NOT EXISTS (select * from reservation r where r.seat = s and r.leg between lo and hi)

for each seat, the biggest block of free legs below 3:
    select s. seat, min(l.leg) from SEATS s, LEGS l where
        l.leg <= 3
    and    not exists (select * from reservations r where r.seat = s.seat and r.leg between l.leg and 3)
    group by s.seat;

    select s. seat, min(lo.leg), max(hi.leg) from SEATS s, LEGS lo, LEGS hi
    where
        lo.leg <= 3 and hi.leg >= 3
    and    not exists (select * from reservations r where r.seat = s.seat
                and r.leg between lo.leg and hi.leg)
    group by s.seat;

select s. seat, min(lo.leg), max(hi.leg), max(hi.leg)-min(lo.leg) AS gap
from SEATS s, LEGS lo, LEGS hi
where
    lo.leg <= 3 and hi.leg >= 3
and    not exists (select * from reservations r where r.seat = s.seat
            and r.leg between lo.leg and hi.leg)
group by s.seat
order by gap;

Range query:

select s. seat, min(lo.leg), max(hi.leg), max(hi.leg)-min(lo.leg) AS gap
from SEATS s, LEGS lo, LEGS hi
where
    lo.leg <= 3 and hi.leg >= 4
and    not exists (select * from reservations r where r.seat = s.seat
            and r.leg between lo.leg and hi.leg)
group by s.seat
order by gap;

Best fit:
    1. put the above into an array, stopping when the value of GAP becomes > value of GAP in first row.
    3. Apply tie-breaker rules to the entries in the array.

Basics of forms

A form is enclosed between <form method="post" action="program"> and </form> tags. The usual value for action is "", meaning to re-invoke the same program (in our case, a php program), but now with posted data values.

Inside the form you can have input items, of the form <input type="input-type" ... >. Input items can be

text (one-line text boxes)
button (clickable buttons)
radio (radio buttons)
checkbox (checkboxes)
select: menus with options
select ... multiple: allows selecting several items simultaneously
textarea (scrolling text boxes)
submit (submit buttons, semantically different from ordinary buttons)
more

Here's a big form. You can see it in action at form1.html.

<form action="form1.cgi" method="get" enctype="multipart/form-data">
<input type = "button" value="click me!" name="b1">
<input type = "button" value="me too!" name="b2">
<p> Are we having fun yet?
<center>
<input type="radio" name="fun" value = "yes">Yep!
<input type="radio" name="fun" value = "no">Nope!
<input type="radio" name="fun" value = "maybe" CHECKED>I dunno!
</center>
<p><b>Why are you doing this?</b>
<input type = "checkbox" name="cbox" value="credit" CHECKED>For course credit
<input type = "checkbox" name="cbox" value="money" CHECKED>For the money!
<br>
<p><b>Who are you?</b>
<input type = "text" name="name1" value="I dunno" size=20>
<p>Now for a menu:
<p>
<select size = 1 name="menu1">
<option> first item
<option> second item
<option> number 3!
<option> fourth
<option> last
</select>
<p> now with size=4</p>
<select size = 4 name="menu2" multiple>
<option> first item
<option> second item
<option> number 3!
<option> fourth
<option> last
</select>
<p>Here's a text box</p>
<textarea name="mybox" rows=5 cols = 30>
This message will be in the box. It's plain text, <b>not</b> html!
I'm going to make it long enough that a scrollbar appears,
through the miracle of cut-and-paste.
I'm going to make it long enough that a scrollbar appears,
through the miracle of cut-and-paste.
</textarea>
<p>
<input type="submit">
</form>

Understand the form action, and the idea tha tone page can have multiple <form ...> ... </form> objects.