Database week 7

Comp 353/453: Database Programming, Corboy L08, 4:15 Mondays

Week 7, Mar 11

Read in Elmasri & Navathe (EN)

Chapter 4: Basic SQL
Chapter 5: More-complex SQL (section 5.1 especially)
Chapter 7: ER models and diagrams
Chapter 9: ER-to-relation mapping

The midterm will be the second half of the March 18 class.

The Enhanced ER (EER) model

In this model, we allow for some forms of inheritance.

Figure 8.1 is a starting point. Note that there are several kinds of employee; some kinds participate in relationships and some do not. Note the symbol denoting inheritance; arrows from the parent class to the child class are more common in OOP design.

Ultimately, we may implement the diagram of Fig 8.1 with an EMPLOYEE table, and also tables for SECRETARY, TECHNICIAN, ENGINEER, MANAGER, and HOURLY_EMPLOYEE, each indexed by the ssn and having additional columns for the subclass-specific attributes.

To be in a subclass, you must also be in the superclass.

Note that some subclasses have subclass-specific attributes, and other subclasses have subclass-specific participation in relationships (eg Manager and Hourly_Employee).

The circled (d) in Fig 8.1 stands for "disjoint"; one cannot be a SECRETARY and a TECHNICIAN. However, one can simultaneously be a SECRETARY, a MANAGER, and an HOURLY_EMPLOYEE (at least as the relationship is drawn). In practice, it is likely that each of Secretary, Technician, Manager and Engineer would also belong either to Hourly_Employee or Salaried_Employee. In general, membership in multiple subclasses is to be allowed unless explicitly forbidden with the (d) notation.

The alternative to (d) is (o), for overlapping. EN's example for overlapping subclasses is in Fig 8.5: the parent class is PART and the subclasses are MANUFACTURED_PART and PURCHASED_PART. Some parts can be both here.

Fig 8.2 example (showing disjointness). Secretaries, Engineers and Technicians are all Employees, but everyone belongs to at most one category.

Generalization is the process of realizing that two existing entities, CAR and TRUCK, are really both instances of VEHICLE. See fig 8.3.

Sometimes subclass membership is determined by a field value or Boolean expression involving the parent class (eg jobtype = engineer). Note that such "tag" fields are frowned upon in classic OOP in, say, Java. This arrangement is also called attribute-defined subclassing (or specialization). If the value of a single attribute determines the subclass, this necessarily leads to disjoint subclasses. This is illustrated in Fig 8.4. In other examples, subclass membership represents a form of new data; these are user-defined subclasses. As new subclass records are inserted into the database, the appropriate subclass must also be indicated.

Besides disjoint/overlapping, subclasses may be described as total or partial. Total means that every member of the base class must be in some subclass (ie that the base class is abstract in java notation). Partial means that base-class-only objects may exist. In the Fig 8.1 example, every employee is either salaried or hourly, so the right-hand subclass is total. The double line is used to denote this. Note that this has nothing to do with any of the other subclass relationships.

Multiple inheritance means that we may end up with a lattice of relationships: see Fig 8.6 and Fig 8.7. If multiple inheritance is involved, the classes will not be disjoint. A common OOP issue with multiple inheritance -- resolving method or attribute names when the same name is used in more than one parent class -- is usually handled by requiring attribute names to be unique.

Union types

Sometimes the best way to model a Vehicle type is simply as a union of existing types Car and Truck. See Fig 8.8 for two examples.

Union types generally mean that the designer has not taken advantage of any common attributes. It is particularly helpful to identify a primary key that can be moved to the base class.

EER-to-Relations mapping

Given a parent class C with subclasses S1, S2, ..., Sm, here are some options for defining relations:

A. Create a table for representing C, and separate tables for each Si. Each Si will include a column representing the corresponding C data.
For example, C might be the Employees table, with key ssn; we might have tables for Secretary, Technician and Engineer also with keys ssn.

Multiple inheritance can be handled by having someone in the Employees, Technician and Engineer tables.

B. Create a separate table for each Si, including in each table all the common attributes. This only works if subclassing is total; that is, if every member of the parent class is in some subclass (why?). It becomes inefficient of the Si are not disjoint.

C. Create a single table including all attributes of C and all the Si, and an additional type attribute indicating to which Si the record belongs. For example, we might have fields

fname, lname, ssn, address, type (secy, tech, eng), typing_speed, Tgrade, Eng_type, Eng_degree, year

The value of the type attribute determines which of the remaining attributes are actually used. Disjoint subclasses are necessary here, and space may not be used efficiently. Multiple inheritance is not supported.

D. Like C, but instead use m Boolean attributes to indicate membership in each Si:

fname, lname, ssn, address, is_secy, typing_speed, is_tech, Tgrade, is_eng, Eng_type, Eng_degree, year

This mechanism can handle multiple inheritance reasonably well.

A brief word on auto-commit

To commit your SQL updates is to write the changes to the permanent database; in this sense, it is like save. So far we've assumed that there is a commit operation performed after every insert or update; this situation is called auto-commit. Generally, auto-commit mode is an attribute of your database connection; if auto-commit is true then a commit is performed after every SQL statement that potentially alters the database.

The alternative to auto-commit is to execute a group of updates, and then explicitly invoke the commit operation at the end, to commit all the updates together. A group of SQL statements between consecutive commits is then called a transaction; all the statements of the transaction are committed together. Usually, though, we want a stronger assurance: that all the statements of the transaction either succeed, or none of them do (this is implicit if we know that commits always succeed, but this is not the case in the real world). This is known as the atomicity requirement, the first part of the ACID test (atomicity, consistency, isolation, durability). The idea is that a transaction should be atomic, that is, indivisible: the individual queries that make it up should be executed as a unit.

Instead of a commit, a user may also issue a rollback, which means to throw away all the actions back to the previous commit, thus discarding the transaction.

We'll stick with auto-commit for a while longer, but be aware of two things:

This is why there is no "save" operation
auto-commit is not universal; sometimes you need manual control

Continuation of PHP PDO and LAMP (or WAMP)

See also EN16 chapter 14, but notice that there they use the PEAR library. This can be tricky to install under windows, so I've converted to the simpler PDO library. The differences are minor, and Chapter 14 is still a very useful reference, but be aware.

What I had to do:
    install php5
    install php5-mysql
    install the php.ini-development file (to enable error messages)

If error messages are not enabled, you are doomed.

phpinfo.php
pdo_demos.php
lib353pdo.php
employee.php

For the employee.php file, note the following:

connect_pdo()
examining $_POST for keys submit, update, deralph, none-of-the-above
printform()
department menus
look at html form code

Try removing the ";" from the include statement in pdo_demos.php to admire the elegance and precision of the resultant error message:

Parse error: syntax error, unexpected '$hostname' (T_VARIABLE) in /var/www/company/pdo_demos.php on line 7

(Can you figure out why it is complaining about line 7 for an error that is actually on line 3?)

Basics of html forms

A form is enclosed between <form method="post" action="program"> and </form> tags. The usual value for action is "", meaning to re-invoke the same program (in our case, a php program), but now with posted data values. In PHP, actual form creation will usually be in heredoc output blocks.

Inside the form you can have input items, of the form <input type="input-type" ... >. Input items can be

text (one-line text boxes)
button (clickable buttons, of little use to us)
radio (radio buttons)
checkbox (checkboxes)
select: menus with options
select ... multiple: allows selecting several items simultaneously
textarea (scrolling text boxes)
submit (submit buttons, semantically different from ordinary buttons)
more

The most common form items we will use are text boxes, select menus, and submit buttons.

Here's a big form. You can see it in action at form1.html.

<form action="form1.cgi" method="get" enctype="multipart/form-data">
<input type = "button" value="click me!" name="b1">
<input type = "button" value="me too!" name="b2">
<p> Are we having fun yet?
<center>
<input type="radio" name="fun" value = "yes">Yep!
<input type="radio" name="fun" value = "no">Nope!
<input type="radio" name="fun" value = "maybe" CHECKED>I dunno!
</center>
<p><b>Why are you doing this?</b>
<input type = "checkbox" name="cbox" value="credit" CHECKED>For course credit
<input type = "checkbox" name="cbox" value="money" CHECKED>For the money!
<br>
<p><b>Who are you?</b>
<input type = "text" name="name1" value="I dunno" size=20>
<p>Now for a menu:
<p>
<select size = 1 name="menu1">
<option> first item
<option> second item
<option> number 3!
<option> fourth
<option> last
</select>
<p> now with size=4</p>
<select size = 4 name="menu2" multiple>
<option> first item
<option> second item
<option> number 3!
<option> fourth
<option> last
</select>
<p>Here's a text box</p>
<textarea name="mybox" rows=5 cols = 30>
This message will be in the box. It's plain text, <b>not</b> html!
I'm going to make it long enough that a scrollbar appears,
through the miracle of cut-and-paste.
I'm going to make it long enough that a scrollbar appears,
through the miracle of cut-and-paste.
</textarea>
<p>
<input type="submit">
</form>

Understand the form action. When a form is submitted (via a submit button), the action script is executed, with a long parameter string containing all the form content. The action script is able to look at the parameters to figure out, for example, which button was pressed; see the function main() in the file employee.php for a good example. If we clicked the 'submit' button, then employee.php calls submit_employee(), which in turn retrieves all the textbox fields that had been filled in.

Here is the printform method from employee.php. Note how most of the heredoc block is straight html; the only exception is $deptmenu, which is a string "calculated" by the initial code.

The makePageButtons() method, in lib353pdo.php, actually creates four forms! One per button. One page can have multiple <form ...> ... </form> objects.

function printform($db) {
        $query = "select dnumber from department order by dnumber";
        $stmt = simple_query($db, $query, array());
        if (is_string($stmt)) {
                print ("failed query: \"$query\"");
                die();
        }
        // query was successful
        $theDepts = $stmt->fetchAll(PDO::FETCH_COLUMN);
        $deptmenu = makeSelectMenuString("dno", $theDepts);
print <<<FORMEND
        <form method="post" action="">
        Use this page to enter new employees<p>
        <input type="text" name="fname" value="ralph">
        <input type="text" name="minit" size="1" value="j">
        <input type="text" name="lname" value="wiggums"> required
        <p>
        <input type="text" name="ssn" value="abcdefghi"> required
        <br><input type="text" name="sex" value="M"> M/F
        <br><input type="text" name="bdate" value="1980-07-04"> YYYY-MM-DD
        <br><input type="text" name="salary" value="9999"> (annual salary)
        <br><input type="text" name="super_ssn" value="999887777"> supervisor (by SSN)
        <br><input type="text" name="address" value="no fixed abode"> address
        <br>$deptmenu department number
        <p><input type="submit" name="submit" value="submit data">
        <p><input type="submit" name="update" value="update">
              <input type="submit" name="deralph" value="de-Ralph">
        </form>
FORMEND;
        makePageButtons();
}

PHP basics

The main purpose of PHP is actually to create and process forms, using server-side scripting. Databases are just common examples of this.

Today's demo is actually running on my laptop: I have apache and php installed. I have php files in /var/www; when I access these through the browser they execute.

The first example is info.php:

<?
phpinfo()
?>

This produces quite a lot of output; note in particular the magic_quotes_gpc value.

Here is the greeting demo from E&N (greeting.php):

<?php
// greeting demo from Elmasri & Navathe 14.1
// this prints a welcome message if the user enters their name

if ($_POST['user_name']) {
    print("Welcome, ");
    print($_POST['user_name']);
} else {
    // create the form
    $action=$_SERVER['PHP_SELF'];
    print <<<_HTML_
    <FORM method="post" action="$action"> 
    Enter your name: <input type="text" name="user_name">
    <BR>
    <INPUT type="submit" value="Submit name">
    </FORM>
_HTML_;
}
?>

The associative-array variable $_POST contains data sent as the result of an HTML POST method. The value for key "submit" (the name of the submit button) would definitely be set. In this case, we print the username, $_POST['user_name'].

Before the submit button is clicked the $_POST structure is empty, and so we initially execute the else part above. $action is set to the name of the current script, which is also the name of the script to process our input. Everything between the print <_HTML_ and the following _HTML_ (which must begin in column 1!!) is treated as static text (a so-called "heredoc"), and is printed "as is", except that anything beginning with $ is treated as a PHP variable and its value is printed instead. The only variable is $action (which in many cases will be "", meaning to use the same source file). We could also have used ordinary print statements here, but those are tedious.

My version is slightly different from the book's version, which had

<FORM method="post" action="$_SERVER['PHP_SELF']">

The part in bold here is a "superglobal" variable, and my installation of php does not allow superglobals in "here docs". I can only assume this is a security issue; note the workaround above.

It can be confusing to figure out at what point we "start over", and forget the existing $_POST. Doing a browser reload resends the POST data and so we stay on the second page; going to the location bar and hitting return reloads without reposting and so we go back to the first page.

Note we simply ignore <html><body>, etc; these really should be put in. Note also that printing output works as desired, except your output is formatted according to what the browser wants to do with it. In particular, browsers ignore line breaks; they start new lines with <p> or <br> tags.

Next is selection.php, which replaces the input box for a name with a selection for department:

    <FORM method="post" action="$action"> 
    Select your department: 
    <select name="department">
        <option value="administration">Administration</option>
        <option value="headquarters">Headquarters</option>
        <option value="research" selected="selected">Research</option>
    </select>
    <BR>
    <INPUT type="submit" value="Submit">
    </FORM>

http://localhost/selection.phpBefore going further, let's introduce some syntax errors into the php file, and see what happens.

As our final forms-only example, here is emp.php, which creates a form with multiple fields, and two buttons. Here is the form-creation part (as a here doc):

    print <<<FORMEND
	Enter new employee:<p>
	<form method="post" action="">
	<input type="text" name="fname" value="ralph">
	<input type="text" name="minit" size="1" value="j">
	<input type="text" name="lname" value="wiggums"> required
	<p>
	<input type="text" name="ssn" value="abcdefghi"> required
	<br><input type="text" name="sex" value="M"> M/F
	<br><input type="text" name="bdate" value="1980-07-04"> YYYY-MM-DD
	<br><input type="text" name="salary" value="9999"> (annual salary)
	<br><input type="text" name="super_ssn" value="999887777"> supervisor (by SSN)
	<br><input type="text" name="address" value="no fixed abode"> address
	<br><input type="text" name="dno" value="4"> department number
        <p><input type="submit" name="submit" value="submit data">
	<p><input type="submit" name="update" value="update">
	</form>
FORMEND;

Default values have been provided. When the user clicks "submit", we simply print this. Note the button "update", not used here. Both buttons are of type "submit", but only the first is named "submit", meaning that only the first will trigger the if ($_POST['submit']).

Below (in the Java section) there is an example of the problem of SQL injection: allowing unchecked user-supplied strings to be used as SQL fields in dynamically executed strings. The use of prepared statements pretty much makes this impossible. There is, however, also the concern about HTML injection, more commonly known as cross-site scripting or XSS. We should be more careful about echoing back our HTML. When we're the only viewer, all we can do is feed ourselves our own script, but if we post something on a social networking site like this:

I agree! <script>do_bad_things()</script>

then we've, well, done bad things. In this class we are not going to obsess about sanitizing our HTML (though we really should). We will, however, try to use htmlspecialchars().

pdo_demos.php

The file pdo_demos.php makes a series of simple queries. There is no form involved; there is no provision for changing the arguments of the query except to edit the file.

Now let's look at employee.php again.

PHP notes

0. PHP uses what is sometimes called duck typing: if it meets the interface of a duck (that is, if it has quack() and waddle() methods), then it is a duck. Even if it is not declared as such.

PHP also uses dynamic typing, meaning that a variable can hold an object of any type. In some of the examples, a function returns something which is a query_result object if the query succeeded, and a string or Boolean if it did not. You need to use runtime type checking to determine which it is.

1. I've tried to make employee.php do reasonable error checking. After each prepare(), execute() or query() I called either $db->errorInfo() or $stmt->errorInfo(), and selected item [2] of the returned array for printing.

Note that $db->prepare() returns either a PDOStatement object or FALSE; if the latter you must call $db->errorInfo(). $db->query() always returns a Boolean; if it is FALSE then the explanation is at ($db->errorInfo())[2] (which expression is actually illegal in PHP; you need to break it up as I did in employee.php). Finally, $stmt->execute() also always returns a Boolean; if FALSE then the explanation is at $stmt->errorInfo(), item [2]. While they work the same way, $db->errorInfo() and $stmt->errorInfo() are methods of different classes.

Checking prepare(),etc results for equality with FALSE does not involve any kind of type shenanigans; these methods either return a PDOStatement object or a Boolean (you can't do that in Java, now, can you). (Checking $_POST['submit'] as if it were a Boolean does involve some built-in type casts; if the submit button was not defined then $_POST['submit'] is technically undefined, or maybe NULL.)

2. Note that your php file is executed from scratch on each web click. There's no easy way to store any state between clicks. In particular, you have to connect to the database each time.

3. While you can put PHP variables into print statements (including heredoc print statements)
    print ("<b>Error: $errormsg</b><p>\n");
you can not put expressions into print statements; that is, you can't do print("$db->errorInfo()<p>/n").

You also apparently cannot even do this:
    $errmsg = ($db->$errorInfo())[2];
You have to break up the method call and the array access to separate statements:
    $errArray = $db->$errorInfo();
    $errmsg = $errArray[2];

4. I updated de_ralph($db). The new version restores table employee from table employeebackup, which, of course, must exist for this to work.

I also had to turn off FK checking ("set foreign_key_checks=0"). You only need to do this if you create the tables using innodb, which I recommend. I had four steps:

set foreign_key_checks=0
delete from employee
insert into employee select * from employeebackup
set foreign_key_checks=1

I strung all these together into one monster query and sent that off to MySQL. The new problem is that my error-checking only reports errors from the last query in the chain, but if anything is going to go wrong it is #2, "delete from employee", which will break some FK constraints. I should have done each one separately, or at least #2.

In light of all this, the default de_ralph() on the web page is still the old one.

By the way, the above can still break FK constraints even if it executes properly. How?

5. I have a simple makeSelectMenu. In the form makeSelectMenuString, it returns a string that you can put into a heredoc block. The parameters for makeSelectMenuString are the form-object name (used for retrieving POSTed data) and an array of values. Here's an example creation of an array of department numbers:

$query = "select dnumber from department";
$retstmt = $db->query($query);
if ($retstmt == FALSE) { /* usual error-checking */ }
$values = $retstmt->fetchAll(PDO::FETCH_COLUMN);
$menustring = makeSelectMenuString("dno", $values);

You may be asked to modify this for the project.