Java project option

Java programming project

This is intended as an alternative to the PHP project. The PHP project is a better fit for anyone wanting to develop standard webpage-with-database applications; the Java option involves using Java to manipulate the database in ways that are difficult or impossible to do within SQL.

There are two parts to the project here. I really wanted one of the parts to be some form of data mining, but that seems unlikely (unless you count the Part 2 problem as fitting that category).

Here are some demo examples using jdbc:

employees1.java: simple example of stmt.executeQuery()
employees2.java: uses PreparedStatement
employees2s.java: uses server-side PreparedStatement

To compile any of these, you will need mysql_jdbc.jar in your CLASSPATH. I do this from the command line:

CLASSPATH=$CLASSPATH:.:/home/pld/353/jdbc/mysql_jdbc.jar
export CLASSPATH

There is some test data below, but it's too big for early-stage testing; I'll come up with some "easier" test data soon. Note that all that is actually needed for either example is a single table (PREREQUISITE for part 1, and WORKS_ON for part2); this can be processed as a stand-alone table without involving the other university/company tables. I will provide several tables: prereq1, prereq2, etc for Part 1 and workson1, workson2, etc for Part 2. You can process all of them within the same MySQL database if you append the digit in the name to your output tables (eg indirect_prereq1, worker_dist2, etc).

To load a table, eg workson1.text, into MySQL, perhaps the easiest is to start a mysql session in the directory containing the file, select the desired database, and then use the command source workson1.text.

You may represent SSNs, course_numbers, etc as either String or Integer within Java.

Part 1

You will be given a table called prerequisite, listing course numbers and their prerequisites. You will

from within Java, retrieve the entire contents of the prerequisite table using JDBC
find all indirect prerequisites. For example, if 170 is a prereq for 271 and 271 is listed as a prereq for 353, then 170 is an indirect prereq for 353.
in the course of finding indirect prerequisites, you may detect circularities: 160 is a prereq for 180 is a prereq for 190 is a prereq for 160.
Again using JDBC, create a table indirect_prerequisites of all pairs (course1, course2) for which course1 is an indirect prereq for course2. No course involved in a circularity should go in this table, however.
Using JDBC, create a single-column table circular_prerequisites that lists all courses involved in a cycle.

The algorithm here is the following. It is about as straightforward a graph algorithm as these get, but that of course does not mean it is straightforward. After all, the point was to find something not doable in SQL directly. The current version, however, does handle the courses one at a time.

The indirect prerequisites represent the transitive closure of the prerequisite relationship: the complete relationship including all direct prerequisites and which is transitive: if has_prereq(a,b) and has_prereq(b,c) then has_prereq(a,c).

To get started, set up the following:

Read from the database a list of ⟨course, direct_prereq⟩ pairs
Create a Map, prereqMap, of each course and the list of indirect prerequisites (key = coursenum, value = List of coursenums)
Create a list circularList of courses with a circular cycle of prerequisites
Populate prereqMap with each course and its list of direct prerequisites (this can be done as one pass through the database)

For each course in prereqMap.keySet():
    Let pList = prereqMap.get(course).
    We will be adding to pList at the end, as we traverse it, so we can't use an iterator.
    Let i=0.
    while (i < pList.size()):
          prereq = ith element of pList
          // if prereq == course, skip it!
          subList = known prereqs of prereq (prereqMap.get(prereq))
          add elements of subList to the end of pList, so as not to disturb position i of pList
                do not add any elements of subList that are already present in pList!
          increment i

During the course of the above, we may add a course to its own prerequisite list. The next phase is to identify all courses that have themselves as members of their now-updated pList, and add these courses to circularList. It's probably simplest to leave them in the original prereqMap.

Test data:

prereq1.text:
prereq2.text:
prereq3.text: some circularity

Part 2

Let's say two employees in the Company database have worked_together if there is a project on which both have worked:

select distinct e1.essn, e2.essn from works_on e1, works_on e2 where e1.pno = e2.pno;

(Note this will list employees as having worked_together with themselves, provided the employee worked on some project. Occasionally such an employee might not have worked on the same project as anyone else; that is, they may have worked on only one project and been the only employee on that project.)

Step 1: find the transitive closure of the worked_together relationship; that is, identify all pairs of employees eA,eB who worked together indirectly: for some sequence of employees e1...eN,
eA worked_together e1 worked_together e2 worked_together ... worked_together eN worked_together eB

Step 2: This divides the set of employees into groups ("equivalence classes"), where two employees are in the same group if and only if they have indirectly worked together. Give each group a unique integer code (and, eventually, create a table work_group ⟨ssn,group_code⟩ back in the database of employees and these codes.

Step 3: For employees in the same group, find the distance between them in terms of the indirect worked_together relationship. The distance between two employees who have worked together on the same project is 1; in the Step 1 example the distance between eA and eB is N+1 (unless some shorter sequence than e1...eN connects eA and eB. Create a table worker_dist ⟨ssn1,ssn2,dist⟩ back in the database that lists the distance between every pair of employees who are in the same Step-2 group.

Algorithm

There are several approaches. Here's a way to combine all three steps in one: construct a datatype ⟨ssn,distance⟩, which we'll call EmpDist. Construct a HashMap WTmap with key ssn and with value a List<EmpDist>. For each employee ssn, construct the list of other employees that ssn has worked with on a project, and install those with distance 1.

Now, let distance = 2 and repeat the following until there are no further changes:

Go through each ssn in WTmap.keyset(). Call the current ssn theSsn, and call the associated list WTmap.get(theSsn) worksWithSoFar.
Go through every element in worksWithSoFar. For element ww of worksWithSoFar, let ww.ssn() be the ssn. Fetch WTmap.get(ww.ssn()). Let's call this list wwCandidates.
Suppose in wwCandidates we find an ssn, called newSsn, that is not already in worksWithSoFar (and that is not theSsn!). Calculate newDistance = distance(theSsn,e) + distance(e,newSsn). If newDistance == distance, then install newSsn onto worksWithSoFar with distance newDistance. (If newDistance != distance, do nothing.)
increment distance.

If you can find a way to redesign the data structure to be more abstract, let me know.

The algorithm here is a special case of the Shortest Path First algorithm. In order to be sure we're getting the minimum distance between two employees, it is essential in step 3 above to include the check newDistance == distance, and to wait to install newSsn at a later cycle if newDistance > distance. (The < relationship cannot hold, as the entries would have been installed at an earlier step.)

It is straightforward to verify by induction that all indirect-works-with entries with distance d will be added in the cycle when distance = d. As a result, if on stage d we add that eB has worked with eA with distance d, we will also add that eA has worked with eB with distance d.

Note also that you cannot call worksWithSoFar.add() while inside of a for-each loop involving worksWithSoFar: for (EmpDist ww : worksWithSoFar). You can either use a while (i < worksWithSoFar.size()) loop instead, or else keep new entries such as newSsn in a separate temporary list, to be added to worksWithSoFar at the end of the loop.

The next step is to assign each employee a group code, so that employees have the same code if and only if they have indirectly worked with each other. A straightforward way of doing this is as follows; a feature of this approach is that employees who have worked with nobody get a group code all their own. We start with a HashMap groupMap with key ssn and with value Integer. Let GroupCode = 1.

for each employee e, check GroupMap.get(e). If it is not assigned,

set e's group id to GroupCode
for every employee e1 in WTmap.get(e), set e1's group id to GroupCode as well (note that WTmap.get(e) is actually a list of EmpDist objects, not Employees.)
increment GroupCode

The final step, of course, is to use JDBC to create new tables back in the database containing the WTmap information (⟨essn1,essn2,dist⟩) and the GroupMap information (⟨essn,code⟩).

Test data:

workson1.text: a ring of works_on relationships
workson2.text: larger pools joined by a link
workson3.text: long chain