Comp 353/453: Database Programming, Corboy L08, 4:15 Mondays

Week 9, Mar 25




Java Programming


JDBC and SQLJ are two popular Java interfaces to SQL. Note that SQLJ requires that your sql statements be compiled into your program; this is a major problem. JDBC allows dynamically created strings to be used for queries. Still, the strings are interpreted, leaving room for problems. The worst such problem is "SQL Injection".

Documentation is found in under "java.sql".

Note that JDBC would only be applicable if you were building your application with java. Perhaps most web interfaces are browser-based.

You need to make sure that your CLASSPATH contains mysql_jdbc.jar, or whatever is the appropriate wrapper. If you're using eclipse or blueJ, you add this .jar file to your project.

We'll look at two examples. The first is employees1.java, which shows the process of creating a Connection object. This object knows that we are creating a connection to MySQL (some of the commented-out parts relate to Oracle). The book's way of creating a connection is slightly different, using
    Class.forName("oracle.jdbc.driver.OracleDriver");
The two amount to the same thing.

The next thing done in employees1 is to create a Statement from the connection, and then execute the query. The query returns a ResultSet. Note the getString() method to extract data, and the next() method.

The second program is employees2.java. Here, I use a parameterized query (prepared statement).

SQL Injection

Why not build strings manually? See employees3.java. A problem occurs if there are quotes in the data, particularly if that data is user-supplied. Suppose the user provides a USERNAME and a PASSWORD string, and the query does

    select * from AUTH_USERS where user=$USERNAME

Suppose the user sends USERNAME= x' or '1'='1
Then the above may be naively formatted into
    select * from AUTH_USERS where user='x' or '1'='1' (note two extra quotes!)
This is now everyone.

bobby tables

Prepared Statements

Above we looked at Java's PreparedStatement objects:

    PreparedStatement p = connection.prepareStatement(
            "select e.fname, e.lname, e.ssn, e.bdate, e.salary, d.dname from employee e, department d "
         + "where e.dno=d.dnumber and e.dno=? and e.lname=?"
    );

Prepared statements, or parameterized statements/queries, can be either client-side or server-side. Client-side means that the client replaced the prepared-statement '?' placeholders with appropriately escaped strings, and then passed the entire query to the database. The client must be sure to handle the escaping of embedded quotes correctly or risk a SQL injection attack. Server-side prepared statements mean that the query with embedded '?'s is sent to the database server, and then later the values to be plugged in are sent. Ideally, the value strings are sent as an array of raw string objects, so there is no ambiguity as to how escaping of quotation marks is to work.

To see some of this in action, turn on the general_log_file in the MySQL configuration file (/etc/mysql/my.cnf for me), and enable query logging
    general_log_file   = /var/log/mysql/mysql.log
    general_log          = 1

After doing this, let's run employee2.java again, with e.lname of "O'Hara". We get (or got; this is actually from a while back) something like

52 Query    SHOW COLLATION
52 Query    SET NAMES latin1
52 Query    SET character_set_results = NULL
52 Query    SET autocommit=1
52 Query    SET sql_mode='STRICT_TRANS_TABLES'
52 Query    select e.fname, e.lname, e.ssn, e.bdate, e.salary, d.dname from employee e, department d where e.dno=d.dnumber and e.dno=5 and e.lname='O\'Hara'

Note that the escaped value for e.lname does show up here, 'O\'Hara', but that was done by the client, that is, by the JDBC connection. The last line above is also what appears when we print the preparedStatement object p after the final p.setString(), suggesting that at this point (before query execution) java has already replaced the '?' with its corresponding argument.

By comparison, here's the same sort of thing after using MDB2 PHP preparedStatements, which use server-side prepares (I've omitted a few unnecessary items)

56 Query    PREPARE MDB2_STATEMENT 'select e.fname, e.lname, e.ssn, e.bdate, e.salary, d.dname from employee e, department d where e.dno=d.dnumber and e.dno=? and e.lname=?'
56 Query    SET @0 = 5
56 Query    SET @1 = 'O\'Hara'
56 Query    EXECUTE MDB2_STATEMENT USING @0, @1

This time it is the server doing the escaping where 'O\'Hara' is displayed.

This isn't what PDO uses, however. The PDO library has an option, PDO::ATTR_EMULATE_PREPARES, normally true (meaning client-side emulation of prepared statements is done). When this is true, $db->prepare($insertion) does nothing; the following is the result of the prepare($insertion) and the execute($queryargs). There is no mention of a prepare() being invoked by MySQL (because it is not!).

47 Query    insert into employee values ('ralph','j','wiggums','abcdefghi','1980-07-04','no fixed abode','M','9999','999887777','1')

Here is the MySQL log result of an employee.php insert with PDO::ATTR_EMULATE_PREPARES set to false.

48 Prepare    insert into employee values (?,?,?,?,?,?,?,?,?,?)
48 Execute    insert into employee values ('ralph','j','wiggums','abcdefghi','1980-07-04','no fixed abode','M','9999','999887777','1')
48 Close stmt  

I am not sure why the "48 Execute" above shows the quoted/escaped values, except that the log is from MySQL itself and may be showing the completed query.

From http://php.net/manual/en/pdo.prepare.php:

public at grik dot net 07-Mar-2012 12:23
With PDO_MYSQL you need to remember about the PDO::ATTR_EMULATE_PREPARES option.

The default value is TRUE, like
$dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES,true);

This means that no prepared statement is created with $dbh->prepare() call. With exec() call PDO replaces the placeholders with values itself and sends MySQL a generic query string.

The first consequence is that the call  $dbh->prepare('garbage');
reports no error. You will get an SQL error during the $dbh->exec() call.
The second one is the SQL injection risk in special cases, like using a placeholder for the table name.

The reason for emulation is a poor performance of MySQL with prepared statements. Emulation works significantly faster.

I became suspicious when I added better error checking to employees.php, and discovered that misspelled table names did not cause errors in $db->prepare($query).

Server-side preparation can infer the type of any column value from the appropriate table definition. Client-side preparation must do a little more work.



Design Guidelines and Normalization

Normalization refers to a mathematical process for decomposing tables into component tables; it is part of a broader area of database design. The criteria that guide this are functional dependencies, which should really be seen as user-supplied constraints on the data.

Our first design point will be to make sure that relation and entity attributes have clear semantics. That is, we can easily explain attributes, and they are all related. E&N spells this out as follows:

E&N Guideline 1: Design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation.

As examples, consider
   
EMP_DEPT
    Ename
    Ssn
    Bdate
    Address
    Dnumber
    Dname
    Dmgr_ssn

This mixes employee information with department information.

Another example is

EMP_PROJ
    Ssn
    Pnumber
    Hours
    Ename
    Pname
    Plocation

Both these later records can lead to anomalies of the following types:
This leads to

E&N Guideline 2: design your database so there are no insertion, deletion, or update anomalies. If this is not possible, document any anomalies clearly so that the update software can take them into account.

The third guideline is about reducing NULLs, at least for frequently used data. Consider the example of a disjoint set of inherited entity types, implemented as a single fat entity table. Secretaries, Engineers, and Managers each have their own attributes; every record has NULLs for two out of these three.

E&N Guideline 3: NULLs should be used only for exceptional conditions. If there are many NULLs in a column, consider a separate table.

The fourth guideline is about joins that give back spurious tuples. Consider

EMP_LOCS
    Ename
    Plocation

EMP_PROJ1
    Ssn
    Pnumber
    Hours
    Pname
    Plocation

If we join these two tables on field Plocation, we do not get what we want! (We would if we made EMP_LOCS have Ssn instead of Ename, and then joined the two on the Ssn column.) We can create each of these as a view:

create view emp_locs as select e.lname, p.plocation from employee e, works_on w, project p
where e.ssn=w.essn and w.pno=p.pnumber;

create view emp_proj1 as select e.ssn, p.pnumber, w.hours, p.pname, p.plocation
from employee e, works_on w, project p where e.ssn=w.essn and w.pno = p.pnumber;

Now let us join these on plocation:

select * from emp_locs el,  emp_proj1 ep where el.plocation = ep.plocation;

Oops. What is wrong?

E&N Guideline 4: Design relational schemas so that they can be joined with equality conditions on attributes that are appropriatedly related ⟨primary key, foreign key⟩ pairs in a way that guarantees that no spurious tuples are generated. Avoid relations that contain matching attributes that are not ⟨foreign key, primary key⟩ combinations.