Comp 353/453: Database Programming, Corboy
523, 7:00 Thursdays
Week 8, Mar 15
Read in Elmasri & Navathe (EN)
- Chapter 15, Normalization
I redid the week-7 notes on INVOICE.
Second look at PHP PDO and LAMP (or WAMP)
XML
XML is a markup language more structured than HTML. Its strict
hierarchical structure makes it suitable for textual encoding of
databases; see Fig 12.3 and below.
The primary role of XML in databases is for transporting the data, not
storing it natively. There are attempts at defining native XML
databases, but it is hard to get around the problem that tagged text is
an inefficient storage mechanism. For example, finding records means
searching linearly. If you are worried about going to a NoSQL approach
because your database is getting too big, XML is not for you!
While the encoding of Fig 12.3 (below) looks promising, note that the
<Worker> entries are more complex than they are in our relational
database; they contain more than just a key to the EMPLOYEE table and
the hours attribute. Arguably,
even the employee and hours information really should be in a separate
XML section <works_on>....</works_on>.
This method does illustrate the hierarchical structure supported by XML.
In this version, we are using xml only to list the data; we are not using xml to define its format or the table structure.
XML Structured Data: EN Fig 12.3
<?xml version="1.0" standalone="yes">
<Projects>
<Project>
<Name>ProductX</Name>
<Number>1</Number>
<Location>Bellaire</Location>
<Dept_no>5</Dept_no>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</Last_name>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<First_name>Joyce</First_name>
<Hours>20.0</Hours>
</Worker>
</Project>
<Project>
<Name>ProductY</Name>
<Number>2</Number>
<Location>Sugarland</Location>
<Dept_no>5</Dept_no>
<Worker>
<Ssn>123456789</Ssn>
<Hours>7.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>20.0</Hours>
</Worker>
<Worker>
<Ssn>333445555</Ssn>
<Hours>10.0</Hours>
</Worker>
</Project>
</Projects>
XML DTD (Document Type Definition)
One approach to imposing field structure on XML documents is illustrated by Fig 12.4 (reproduced as text below),
which shows a small XML DTD
(Document Type Definition) document. DTD was the original mechanism for
defining data formats in XML. One of the weaknesses of DTD is that it
has a very limited set of atomic attribute types; in the example here,
#PCDATA (parsed character data) essentially means a string. There is no provision for strings of fixed length, or CHAR versus VARCHAR.
Here is the DTD code:
<!DOCTYPE projects [
<!ELEMENT projects (project +)>
<!ELEMENT project (Name, Number, Location, DeptNo?, Workers)
<!ATTLIST Project ProjId ID #REQUIRED>
>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Number (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT DeptNo (#PCDATA)>
<!ELEMENT Workers (Worker *)>
<!ELEMENT Worker (SSN, LastName?, FirstName? , hours)>
<!ELEMENT SSN (#PCDATA)>
<!ELEMENT LastName (#PCDATA)>
<!ELEMENT FirstName (#PCDATA)>
<!ELEMENT hours (#PCDATA)>
] >
The syntax here is
!IElement
type_identifier being defined
type-spec in parentheses: list of attributes, or definition as a list
Attributes followed by ? can be null (that is, can appear 0 or 1
times). Attributes or subtypes followed by + can appear 1 or more
times. Attributes or subtypes followed by * can appear 0 or more times.
DTD allows us to specify
that
- Projects is a sequence of one or more Project objects
- A Project has a Name, Number, Location, Dept_no (optional), and Workers. It also has a ProjId atribute.
- Workers is a sequence of zero or more Worker objects.
- Worker objects have SSn, Last_name (opt), First_name (opt), and Hours.
We can now verify that Fig 12.3 actually fits this form. If we wanted to enforce this, we might change the first line to
<?xml version="1.0" standalone="no">
<!DOCTYPE Projects SYSTEM "proj.dtd">
This allows us to enforce which fields can be present, but doesn't
really spell out the full schema involving other tables.
XML Schema Definition (XSD)
For this we
move on to XML schema, or XML Schema Definitions, XSD,
illustrated in Fig 12.5. While this format is more complex, it also
supports unordered data, and shares a more general XML syntax.
Note in the first of these the three entities department, employee (with subtype dependentNameUnique), and
project. The locations table will later be listed as a multivalued
attribute of Department, and dependents will be a multivalued attribute
of Employee. WorksOn will be a multivalued attribute of Employee, and
Worker a multivalued attribute of Project.
This format can serve (though perhaps not terribly efficiently) as a
replacement for the SQL table definitions for the COMPANY database.
Parts of the XML schema:
- defining the main entities of "company"
- key and foreign-key constraints
- constraints are given names here
- xsd:selector is the table name
- xsd:field is the attribute name within the table
- table definitions
Note in the Employee and Project sections the reference to WorksOn, and
in Employee the employeeDependent, and in Department the
departmentLocation.
Here is the Fig 12.5 XML schema.