Comp 353/453: Database Programming, Corboy
L08, 4:15 Mondays
Week 9, Mar 25
Java Programming
JDBC and SQLJ are two popular Java interfaces to SQL. Note that SQLJ
requires that your sql statements be compiled into your program; this is a
major problem. JDBC allows dynamically created strings to be used for
queries. Still, the strings are interpreted,
leaving room for problems. The worst such problem is "SQL Injection".
Documentation is found in under "java.sql".
Note that JDBC would only be applicable if you were building your
application with java. Perhaps most web interfaces are browser-based.
You need to make sure that your CLASSPATH contains mysql_jdbc.jar, or
whatever is the appropriate wrapper. If you're using eclipse or blueJ, you
add this .jar file to your project.
We'll look at two examples. The first is employees1.java,
which shows the process of creating a Connection object. This object knows
that we are creating a connection to MySQL (some of the commented-out parts
relate to Oracle). The book's way of creating a connection is slightly
different, using
Class.forName("oracle.jdbc.driver.OracleDriver");
The two amount to the same thing.
The next thing done in employees1 is to create a Statement from the
connection, and then execute the query. The query returns a ResultSet. Note
the getString() method to extract data, and the next() method.
The second program is employees2.java.
Here, I use a parameterized query (prepared statement).
SQL Injection
Why not build strings manually? See employees3.java.
A problem occurs if there are quotes in the data, particularly if that data
is user-supplied. Suppose the user provides a USERNAME and a PASSWORD
string, and the query does
select * from AUTH_USERS where user=$USERNAME
Suppose the user sends USERNAME= x' or '1'='1
Then the above may be naively formatted into
select * from AUTH_USERS where user='x' or '1'='1'
(note two extra quotes!)
This is now everyone.

Prepared Statements
Above we looked at Java's PreparedStatement objects:
PreparedStatement p = connection.prepareStatement(
"select e.fname,
e.lname, e.ssn, e.bdate, e.salary, d.dname from employee e, department d "
+ "where e.dno=d.dnumber and
e.dno=? and e.lname=?"
);
Prepared statements, or parameterized statements/queries, can be either client-side or server-side.
Client-side means that the client replaced the prepared-statement '?'
placeholders with appropriately escaped strings, and then passed the entire
query to the database. The client must be sure to handle the escaping of
embedded quotes correctly or risk a SQL injection attack. Server-side
prepared statements mean that the query with embedded '?'s is sent to the
database server, and then later the values to be plugged in are sent.
Ideally, the value strings are sent as an array of raw string objects, so
there is no ambiguity as to how escaping of quotation marks is to work.
To see some of this in action, turn on the general_log_file in the MySQL
configuration file (/etc/mysql/my.cnf for me), and enable query logging
general_log_file = /var/log/mysql/mysql.log
general_log =
1
After doing this, let's run employee2.java again, with e.lname of "O'Hara".
We get (or got; this is actually
from a while back) something like
52 Query SHOW COLLATION
52 Query SET NAMES latin1
52 Query SET character_set_results = NULL
52 Query SET autocommit=1
52 Query SET sql_mode='STRICT_TRANS_TABLES'
52 Query select e.fname, e.lname, e.ssn, e.bdate,
e.salary, d.dname from employee e, department d where e.dno=d.dnumber and
e.dno=5 and e.lname='O\'Hara'
Note that the escaped value for e.lname does show up here, 'O\'Hara', but
that was done by the client, that
is, by the JDBC connection. The last line above is also what appears when we
print the preparedStatement object p after the final p.setString(),
suggesting that at this point (before query execution) java has already
replaced the '?' with its corresponding argument.
By comparison, here's the same sort of thing after using MDB2 PHP
preparedStatements, which use server-side
prepares (I've omitted a few unnecessary items)
56 Query PREPARE
MDB2_STATEMENT 'select e.fname, e.lname, e.ssn, e.bdate, e.salary, d.dname
from employee e, department d where e.dno=d.dnumber and e.dno=? and
e.lname=?'
56 Query SET @0 = 5
56 Query SET @1 = 'O\'Hara'
56 Query EXECUTE MDB2_STATEMENT USING @0, @1
This time it is the server doing the escaping where 'O\'Hara' is displayed.
This isn't what PDO uses, however. The PDO library has an
option, PDO::ATTR_EMULATE_PREPARES, normally true (meaning client-side
emulation of prepared statements is done). When this is true, $db->prepare($insertion)
does nothing; the following is the result of the prepare($insertion)
and the execute($queryargs)
. There is no mention of a
prepare() being invoked by MySQL (because it is not!).
47 Query insert into
employee values ('ralph','j','wiggums','abcdefghi','1980-07-04','no fixed
abode','M','9999','999887777','1')
Here is the MySQL log result of an employee.php insert with
PDO::ATTR_EMULATE_PREPARES set to false.
48 Prepare insert into
employee values (?,?,?,?,?,?,?,?,?,?)
48 Execute insert into employee values
('ralph','j','wiggums','abcdefghi','1980-07-04','no fixed
abode','M','9999','999887777','1')
48 Close stmt
I am not sure why the "48 Execute" above shows the quoted/escaped values,
except that the log is from MySQL itself and may be showing the completed
query.
From http://php.net/manual/en/pdo.prepare.php:
public at grik dot net 07-Mar-2012
12:23
With PDO_MYSQL you need
to remember about the PDO::ATTR_EMULATE_PREPARES option.
The default value is TRUE, like
$dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES,true);
This means that no prepared statement is created with
$dbh->prepare() call. With exec() call PDO replaces the
placeholders with values itself and sends MySQL a generic query
string.
The first consequence is that the call
$dbh->prepare('garbage');
reports no error. You will get an SQL error during the
$dbh->exec() call.
The second one is the SQL injection risk in special cases, like
using a placeholder for the table name.
The reason for emulation is a poor performance of MySQL with
prepared statements. Emulation works significantly faster.
I became suspicious when I added better error checking to employees.php, and
discovered that misspelled table names did not
cause errors in $db->prepare($query).
Server-side preparation can infer the type of any column value from the
appropriate table definition. Client-side preparation must do a little more
work.
Design Guidelines and Normalization
Normalization refers to a mathematical process for
decomposing tables into component tables; it is part of a broader area of
database design. The criteria that guide this are functional
dependencies, which should really be seen as user-supplied constraints
on the data.
Our first design point will be to make sure that relation and entity
attributes have clear semantics. That is, we can easily explain attributes,
and they are all related. E&N spells this out as follows:
E&N
Guideline 1: Design a relation schema so that it is easy to explain its
meaning. Do not combine attributes from multiple entity types and
relationship types into a single relation.
As examples, consider
EMP_DEPT
Ename
Ssn
Bdate
Address
Dnumber
Dname
Dmgr_ssn
This mixes employee information with department information.
Another example is
EMP_PROJ
Ssn
Pnumber
Hours
Ename
Pname
Plocation
Both these later records can lead to anomalies of the
following types:
- insertion anomalies: when
adding an employee, we must assign them to a department or else use
NULLs. When adding a new department with no employees, we have to use
NULLs for the employee Ssn, which is supposed to be the primary key!
- deletion anomalies: if we
delete the last EMP_DEP record from a department, we have lost the
department!
- update anomalies: what if we
update some EMP_DEPs with the new Dmgr_ssn, but not others?
This leads to
E&N
Guideline 2: design your database so there are no insertion, deletion,
or update anomalies. If this is not possible, document any anomalies
clearly so that the update software can take them into account.
The third guideline is about reducing NULLs, at least for frequently used
data. Consider the example of a disjoint set of inherited entity types,
implemented as a single fat entity table. Secretaries, Engineers, and
Managers each have their own attributes; every record has NULLs for two out
of these three.
E&N
Guideline 3: NULLs should be used only for exceptional conditions. If
there are many NULLs in a column, consider a separate table.
The fourth guideline is about joins that give back spurious tuples. Consider
EMP_LOCS
Ename
Plocation
EMP_PROJ1
Ssn
Pnumber
Hours
Pname
Plocation
If we join these two tables on field Plocation, we do not get what we want!
(We would if we made EMP_LOCS have
Ssn instead of Ename, and then joined the two on the Ssn column.) We can
create each of these as a view:
create view emp_locs as select e.lname,
p.plocation from employee e, works_on w, project p
where e.ssn=w.essn and w.pno=p.pnumber;
create view emp_proj1 as select e.ssn, p.pnumber, w.hours, p.pname,
p.plocation
from employee e, works_on w, project p where e.ssn=w.essn and w.pno =
p.pnumber;
Now let us join these on plocation:
select * from emp_locs el, emp_proj1
ep where el.plocation = ep.plocation;
Oops. What is wrong?
E&N
Guideline 4: Design relational schemas so that they can be joined with
equality conditions on attributes that are appropriatedly related
⟨primary key, foreign key⟩ pairs in a way that guarantees that no
spurious tuples are generated. Avoid relations that contain matching
attributes that are not ⟨foreign key, primary key⟩ combinations.