Comp 353/453: Database Programming, Corboy
L08, 4:15 Mondays
Week 7, Mar 11
Read in Elmasri & Navathe (EN)
- Chapter 4: Basic SQL
- Chapter 5: More-complex SQL (section 5.1 especially)
- Chapter 7: ER models and diagrams
- Chapter 9: ER-to-relation mapping
The midterm will be the second half of the March 18 class.
The Enhanced ER (EER) model
In this model, we allow for some forms of inheritance.
Figure 8.1 is a starting point. Note that
there are several kinds of
employee; some kinds participate in relationships and some do not. Note the
symbol denoting inheritance; arrows from the parent class to the child class
are more common in OOP design.
Ultimately, we may implement the diagram of Fig 8.1 with an EMPLOYEE table,
and also tables for SECRETARY, TECHNICIAN, ENGINEER, MANAGER, and
HOURLY_EMPLOYEE, each indexed by the ssn and having additional columns for
the subclass-specific attributes.
To be in a subclass, you must also be in the superclass.
Note that some subclasses have subclass-specific attributes, and other
subclasses have subclass-specific participation in relationships (eg Manager
and Hourly_Employee).
The circled (d) in Fig 8.1 stands
for "disjoint"; one cannot be a SECRETARY and
a TECHNICIAN. However, one can simultaneously
be
a SECRETARY, a MANAGER, and an HOURLY_EMPLOYEE (at least as the relationship
is drawn). In practice, it is likely that each of Secretary, Technician,
Manager and Engineer would also belong either to Hourly_Employee or
Salaried_Employee. In general, membership in multiple subclasses is to be
allowed unless explicitly forbidden with the (d) notation.
The alternative to (d) is (o), for overlapping. EN's example for
overlapping subclasses is in Fig 8.5: the
parent class is PART and the subclasses are MANUFACTURED_PART and
PURCHASED_PART. Some parts can be both
here.
Fig 8.2 example (showing disjointness).
Secretaries, Engineers and Technicians are all Employees, but everyone
belongs to at most one category.
Generalization is the process of
realizing that two existing entities, CAR and TRUCK, are really both
instances of VEHICLE. See fig 8.3.
Sometimes subclass membership is determined by a field value or Boolean
expression involving the parent class (eg jobtype = engineer). Note that
such "tag" fields are frowned upon in classic OOP in, say, Java. This
arrangement is also called attribute-defined
subclassing (or specialization). If the value of a single attribute
determines the subclass, this necessarily leads to disjoint subclasses. This
is illustrated in Fig 8.4. In other
examples, subclass membership represents a form of new data; these are user-defined subclasses. As new subclass
records are inserted into the database, the appropriate subclass must also
be indicated.
Besides disjoint/overlapping, subclasses may be described as total
or partial. Total means that every
member of the base class must be in some subclass (ie that the base class is
abstract in java notation). Partial
means that base-class-only objects may exist. In the Fig 8.1 example, every
employee is either salaried or hourly, so the right-hand subclass is total.
The double line is used to denote this. Note that this has nothing to do
with any of the other subclass
relationships.
Multiple inheritance means that we may end up with a lattice
of relationships: see Fig 8.6 and Fig
8.7. If multiple inheritance is involved, the classes will not be
disjoint. A common OOP issue with multiple inheritance -- resolving method
or attribute names when the same name is used in more than one parent class
-- is usually handled by requiring attribute names to be unique.
Union types
Sometimes the best way to model a Vehicle type is simply as a union
of existing types Car and Truck. See Fig 8.8
for two examples.
Union types generally mean that the designer has not taken advantage of any
common attributes. It is particularly helpful to identify a primary key that
can be moved to the base class.
EER-to-Relations mapping
Given a parent class C with subclasses S1, S2, ..., Sm, here are some
options for defining relations:
A. Create a table for representing C, and separate tables for each Si. Each
Si will include a column representing the corresponding C data.
For example, C might be the Employees table, with key ssn; we might have
tables for Secretary, Technician and Engineer also with keys ssn.
Multiple inheritance can be handled by having someone in the Employees,
Technician and Engineer tables.
B. Create a separate table for each Si, including in each table all the
common attributes. This only works if subclassing is total; that is, if
every member of the parent class is in some subclass (why?). It becomes
inefficient of the Si are not disjoint.
C. Create a single table including all attributes of C and all the Si, and
an additional type attribute
indicating to which Si the record belongs. For example, we might have fields
fname, lname, ssn,
address, type (secy, tech, eng),
typing_speed, Tgrade, Eng_type, Eng_degree, year
The value of the type attribute
determines which of the remaining attributes are actually used. Disjoint
subclasses are necessary here, and space may not be used efficiently.
Multiple inheritance is not supported.
D. Like C, but instead use m Boolean attributes to indicate membership in
each Si:
fname, lname, ssn,
address, is_secy, typing_speed, is_tech, Tgrade, is_eng,
Eng_type, Eng_degree, year
This mechanism can handle multiple inheritance reasonably well.
A brief word on auto-commit
To commit your SQL updates is to
write the changes to the permanent database; in this sense, it is like save. So far we've assumed that there
is a commit operation performed after every insert or update; this situation
is called auto-commit. Generally,
auto-commit mode is an attribute of your database connection; if auto-commit
is true then a commit is performed
after every SQL statement that potentially alters the database.
The alternative to auto-commit is to execute a group of updates, and then
explicitly invoke the commit
operation at the end, to commit all the updates together. A group of SQL
statements between consecutive commits is then called a transaction;
all the statements of the transaction are committed
together.
Usually, though, we want a stronger assurance: that all the statements of
the transaction either succeed, or none of them do (this is implicit if we
know that commits always succeed,
but this is not the case in the real world). This is known as the atomicity
requirement, the first part of the ACID test (atomicity, consistency,
isolation, durability). The idea is that a transaction should be atomic,
that is, indivisible: the individual queries that make it up should be
executed as a unit.
Instead of a commit, a user may
also issue a rollback, which means
to throw away all the actions back to the previous commit, thus discarding
the transaction.
We'll stick with auto-commit for a while longer, but be aware of two things:
- This is why there is no "save" operation
- auto-commit is not universal; sometimes you need manual control
Continuation of PHP PDO and LAMP (or WAMP)
See also EN16 chapter 14, but notice that there they use the PEAR library.
This can be tricky to install under windows, so I've converted to the
simpler PDO library. The differences are minor, and Chapter 14 is still a
very useful reference, but be aware.
What I had to do:
install php5
install php5-mysql
install the php.ini-development file (to enable error
messages)
If error messages are not enabled, you are doomed.
phpinfo.php
pdo_demos.php
lib353pdo.php
employee.php
For the employee.php file, note the following:
- connect_pdo()
- examining $_POST for keys submit, update, deralph, none-of-the-above
- printform()
- department menus
- look at html form code
Try removing the ";" from the include
statement in pdo_demos.php to admire the elegance and precision of the
resultant error message:
Parse error:
syntax error, unexpected '$hostname' (T_VARIABLE) in /var/www/company/pdo_demos.php
on line 7
(Can you figure out why it is complaining about line 7 for an error that is
actually on line 3?)
Basics of html forms
A form is enclosed between <form method="post" action="program">
and
</form> tags. The usual value for action is "", meaning to re-invoke
the same program (in our case, a php program), but now with posted
data values. In PHP, actual form creation will usually be in heredoc
output blocks.
Inside the form you can have input
items, of the form <input type="input-type" ... >.
Input items can be
- text (one-line text boxes)
- button (clickable buttons, of little use to us)
- radio (radio buttons)
- checkbox (checkboxes)
- select: menus with options
- select ... multiple: allows selecting several items simultaneously
- textarea (scrolling text boxes)
- submit (submit buttons, semantically different from ordinary buttons)
- more
The most common form items we will use are text boxes, select menus, and
submit buttons.
Here's a big form. You can see it in action at form1.html.
<form action="form1.cgi" method="get"
enctype="multipart/form-data">
<input type = "button" value="click me!" name="b1">
<input type = "button" value="me too!" name="b2">
<p> Are we having fun yet?
<center>
<input type="radio" name="fun" value = "yes">Yep!
<input type="radio" name="fun" value = "no">Nope!
<input type="radio" name="fun" value = "maybe" CHECKED>I
dunno!
</center>
<p><b>Why are you doing this?</b>
<input type = "checkbox" name="cbox" value="credit"
CHECKED>For course credit
<input type = "checkbox" name="cbox" value="money"
CHECKED>For the money!
<br>
<p><b>Who are you?</b>
<input type = "text" name="name1" value="I dunno" size=20>
<p>Now for a menu:
<p>
<select size = 1 name="menu1">
<option> first item
<option> second item
<option> number 3!
<option> fourth
<option> last
</select>
<p> now with size=4</p>
<select size = 4 name="menu2" multiple>
<option> first item
<option> second item
<option> number 3!
<option> fourth
<option> last
</select>
<p>Here's a text box</p>
<textarea name="mybox" rows=5 cols = 30>
This message will be in the box. It's plain text, <b>not</b>
html!
I'm going to make it long enough that a scrollbar appears,
through the miracle of cut-and-paste.
I'm going to make it long enough that a scrollbar appears,
through the miracle of cut-and-paste.
</textarea>
<p>
<input type="submit">
</form>
Understand the form action. When a
form is submitted (via a submit button), the action script is executed, with
a long parameter string containing all the form content. The action script
is able to look at the parameters to figure out, for example, which button
was pressed; see the function main() in the file employee.php for a good
example. If we clicked the 'submit' button, then employee.php calls
submit_employee(), which in turn retrieves all the textbox fields that had
been filled in.
Here is the printform method from employee.php. Note how most of the heredoc
block is straight html; the only exception is $deptmenu,
which is a string "calculated" by the initial code.
The makePageButtons() method, in lib353pdo.php, actually creates four forms!
One per button. One page can have multiple <form ...> ...
</form> objects.
function printform($db) {
$query = "select dnumber from
department order by dnumber";
$stmt = simple_query($db,
$query, array());
if (is_string($stmt)) {
print ("failed query: \"$query\"");
die();
}
// query was successful
$theDepts =
$stmt->fetchAll(PDO::FETCH_COLUMN);
$deptmenu =
makeSelectMenuString("dno", $theDepts);
print <<<FORMEND
<form method="post"
action="">
Use this page to enter new
employees<p>
<input type="text"
name="fname" value="ralph">
<input type="text"
name="minit" size="1" value="j">
<input type="text"
name="lname" value="wiggums"> required
<p>
<input type="text"
name="ssn" value="abcdefghi"> required
<br><input type="text"
name="sex" value="M"> M/F
<br><input type="text"
name="bdate" value="1980-07-04"> YYYY-MM-DD
<br><input type="text"
name="salary" value="9999"> (annual salary)
<br><input type="text"
name="super_ssn" value="999887777"> supervisor (by SSN)
<br><input type="text"
name="address" value="no fixed abode"> address
<br>$deptmenu
department number
<p><input
type="submit" name="submit" value="submit data">
<p><input
type="submit" name="update" value="update">
<input type="submit" name="deralph" value="de-Ralph">
</form>
FORMEND;
makePageButtons();
}
PHP basics
The main purpose of PHP is actually to create and process forms, using
server-side scripting. Databases are just common examples of this.
Today's demo is actually running on my laptop: I have apache and php
installed. I have php files in /var/www; when I access these through
the browser they execute.
The first example is info.php:
This produces quite a lot of output; note in particular the magic_quotes_gpc
value.
Here is the greeting demo from E&N (greeting.php):
<?php
// greeting demo from Elmasri & Navathe 14.1
// this prints a welcome message if the user enters their name
if ($_POST['user_name']) {
print("Welcome, ");
print($_POST['user_name']);
} else {
// create the form
$action=$_SERVER['PHP_SELF'];
print <<<_HTML_
<FORM method="post" action="$action">
Enter your name: <input type="text" name="user_name">
<BR>
<INPUT type="submit" value="Submit name">
</FORM>
_HTML_;
}
?>
The associative-array variable $_POST contains data sent as the result of an
HTML POST method. The value for key "submit" (the name of the submit button)
would definitely be set. In this case, we print the username,
$_POST['user_name'].
Before the submit button is clicked the $_POST structure is empty, and so we
initially execute the else part
above. $action is set to the name of the current script, which is also the
name of the script to process our input. Everything between the print
<_HTML_ and the following _HTML_
(which must begin in column 1!!) is
treated as static text (a so-called "heredoc"), and is printed "as is",
except that anything beginning with $ is treated as a PHP variable and its
value is printed instead. The only variable is $action (which in many cases
will be "", meaning to use the same source file). We could also have used
ordinary print statements here, but those are tedious.
My version is slightly different from the book's version, which had
<FORM method="post" action="$_SERVER['PHP_SELF']">
The part in bold here is a "superglobal" variable, and my installation of
php does not allow superglobals in "here docs". I can only assume this is a
security issue; note the workaround above.
It can be confusing to figure out at what point we "start over", and forget
the existing $_POST. Doing a browser reload resends the POST data and so we
stay on the second page; going to the location bar and hitting return
reloads without reposting and so we go back to the first page.
Note we simply ignore <html><body>, etc; these really
should be put in. Note also that printing output works as desired,
except your output is formatted according to what the browser wants to do
with it. In particular, browsers ignore line breaks; they start new lines
with <p> or <br> tags.
Next is selection.php,
which replaces the input box for a name with a selection
for department:
<FORM method="post" action="$action">
Select your department:
<select name="department">
<option value="administration">Administration</option>
<option value="headquarters">Headquarters</option>
<option value="research" selected="selected">Research</option>
</select>
<BR>
<INPUT type="submit" value="Submit">
</FORM>
http://localhost/selection.phpBefore going further, let's introduce some
syntax errors into the php file, and see what happens.
As our final forms-only example, here is emp.php,
which creates a form with multiple fields, and two buttons. Here is the
form-creation part (as a here doc):
print <<<FORMEND
Enter new employee:<p>
<form method="post" action="">
<input type="text" name="fname" value="ralph">
<input type="text" name="minit" size="1" value="j">
<input type="text" name="lname" value="wiggums"> required
<p>
<input type="text" name="ssn" value="abcdefghi"> required
<br><input type="text" name="sex" value="M"> M/F
<br><input type="text" name="bdate" value="1980-07-04"> YYYY-MM-DD
<br><input type="text" name="salary" value="9999"> (annual salary)
<br><input type="text" name="super_ssn" value="999887777"> supervisor (by SSN)
<br><input type="text" name="address" value="no fixed abode"> address
<br><input type="text" name="dno" value="4"> department number
<p><input type="submit" name="submit" value="submit data">
<p><input type="submit" name="update" value="update">
</form>
FORMEND;
Default values have been provided. When the user clicks "submit", we simply
print this. Note the button "update", not used here. Both buttons are of
type "submit", but only the first is named
"submit", meaning that only the first will trigger the if
($_POST['submit']).
Below (in the Java section) there is an example of the problem of SQL
injection: allowing unchecked user-supplied strings to be used as
SQL fields in dynamically executed strings. The use of prepared
statements pretty much makes this impossible. There is, however,
also the concern about HTML injection,
more commonly known as cross-site
scripting or XSS. We
should be more careful about echoing back our HTML. When we're the only
viewer, all we can do is feed ourselves our own script, but if we post
something on a social networking site like this:
I agree!
<script>do_bad_things()</script>
then we've, well, done bad things. In this class we are not
going to obsess about sanitizing our HTML (though we really should). We
will, however, try to use htmlspecialchars().
pdo_demos.php
The file pdo_demos.php
makes a series of simple queries. There is no form involved; there is no
provision for changing the arguments of the query except to edit the file.
Now let's look at employee.php
again.
PHP notes
0. PHP uses what is sometimes
called duck typing: if it meets the interface of a duck
(that is, if it has quack() and waddle() methods), then it is a
duck. Even if it is not declared as such.
PHP also uses dynamic typing, meaning that a variable
can hold an object of any type. In some of the examples, a function
returns something which is a query_result object if the query succeeded,
and a string or Boolean if it did not. You need to use runtime type
checking to determine which it is.
1. I've tried to make
employee.php do reasonable error checking. After each prepare(), execute()
or query() I called either $db->errorInfo() or $stmt->errorInfo(), and
selected item [2] of the returned array for printing.
Note that $db->prepare() returns either a PDOStatement object or FALSE;
if the latter you must call $db->errorInfo().
$db->query()
always returns a Boolean; if it is FALSE then the explanation is at ($db->errorInfo())[2] (which expression
is actually illegal in PHP; you need to break it up as I did in
employee.php). Finally, $stmt->execute() also always returns a Boolean;
if FALSE then the explanation is at $stmt->errorInfo(),
item
[2]. While they work the same way, $db->errorInfo() and
$stmt->errorInfo() are methods of different classes.
Checking prepare(),etc results for equality with FALSE does not
involve any kind of type shenanigans; these methods either return a
PDOStatement object or a Boolean (you can't do that
in Java, now, can you). (Checking $_POST['submit'] as if it were a Boolean does involve some built-in type casts;
if the submit button was not
defined then $_POST['submit'] is technically undefined, or maybe NULL.)
2. Note that your php
file is executed from scratch on each web click. There's no easy way to
store any state between clicks. In particular, you have to connect to the
database each time.
3. While you can put PHP
variables into print statements (including heredoc print statements)
print ("<b>Error: $errormsg</b><p>\n");
you can not put expressions into
print statements; that is, you can't do
print("$db->errorInfo()<p>/n").
You also apparently cannot even do this:
$errmsg = ($db->$errorInfo())[2];
You have to break up the method call and the array access to separate
statements:
$errArray = $db->$errorInfo();
$errmsg = $errArray[2];
4. I updated
de_ralph($db). The new version restores table employee from table
employeebackup, which, of course, must exist for this to work.
I also had to turn off FK checking ("set foreign_key_checks=0"). You only need to do this if you create the
tables using innodb, which I recommend. I had four steps:
- set foreign_key_checks=0
- delete from employee
- insert into employee select * from employeebackup
- set foreign_key_checks=1
I strung all these together into one monster query and sent that off to
MySQL. The new problem is that my error-checking only reports errors from
the last query in the chain, but if anything is going to go wrong it is #2,
"delete from employee", which will break some FK constraints. I should have
done each one separately, or at least #2.
In light of all this, the default
de_ralph() on the web page is still the old one.
By the way, the above can still break FK constraints even if it executes
properly. How?
5. I have a simple
makeSelectMenu. In the form makeSelectMenuString, it returns a string that
you can put into a heredoc block. The parameters for makeSelectMenuString
are the form-object name (used for retrieving POSTed data) and an array of
values. Here's an example creation of an array of department numbers:
$query = "select dnumber from department";
$retstmt = $db->query($query);
if ($retstmt == FALSE) { /* usual error-checking */ }
$values = $retstmt->fetchAll(PDO::FETCH_COLUMN);
$menustring = makeSelectMenuString("dno", $values);
You may be asked to modify this for the project.