Comp 443 program 2: TCP Stream Reassembler

Due: Friday, April 30

This program is for those registered for graduate credit, that is, Comp 443.

You are to write a program that reassembles the data in tcp streams. I will give you several captured-packet files in tcpdump format; you are to use the jnetpcap library (a java wrapper library for the native pcap library) to extract each packet, determine whether it is part of a known TCP stream, and if so, save the data appropriately.

Note that the data portions of the TCP packets may be out of order or overlapping; it is not sufficient in general to append the data to a string. However, that does work in simpler cases.

You are to handle each direction of a connection separately. That is, if you see a SYN packet from A to B (whether it is the initial SYN or the subsequent SYN+ACK), then you record in a suitable data structure the ⟨sourceIP,sourceport⟩, ⟨destIP,destport⟩, the ISN, and a suitable data-buffer object. If one of the provided tcpdump packet traces consists only of packets from A to B, and none in the reverse direction, this will still work. If the packet-trace file is bidirectional, your program will create two entries; the second will have source and destination reversed (and a different ISN, of course). For the tcpdump files I anticipate providing, it is likely that each connection will send data in only one direction.

As later (non-SYN) TCP packets are seen, you will use the ⟨sourceIP,sourceport,destIP,destport⟩ quadruple as the lookup key to find the corresponding ISN and data-buffer object. In theory, when you see the FIN packet you should delete the key; in practice, I will not reuse ports. When you've processed all the packets, you should write out the reassembled data streams (using something like "...." to fill in any gaps), prefixing each stream with the source and dest ⟨IP,port⟩ sockets.

As you process packets, you will build a collection of ascii strings (data buffers), each with an ISN-corrected starting point and a length. You can either reassemble as you go along, into one large reassembly buffer (one for each connection), or else defer reassembly to the end. Either way, the starting point attached to each data buffer tells you where to put the data. The data buffers may arrive out of order; if you are putting them into a reassembly buffer as you go along then you use the starting point to decide where; if you are deferring reassembly to the end then you can either do that or else first sort by starting point value. It is also possible that two buffers will overlap; that is, buf1 starts at position 300 and has length 200; buf2 starts at position 400 and has length 100 (or 200). In that case, the order in which you process the data buffers may matter, but either order is acceptable. Also, when you're done, it is possible that there will be gaps in the data representing packets I deleted from the tcpdump stream (they can't really be lost, because TCP would have retransmitted the data). If there are gaps, the simplest thing is to print those portions of the stream using "..." or "___", one ./_ character for each missing byte. Do not worry about FIN packets; assume that the stream ends at the last byte you have for it.

You are welcome to do this project in C, using libpcap natively. However, it is harder to create data structures in C; in particular, C has no general-purpose map classes for handling the lookups.

Note that installing jnetpcap is not as simple as dropping jnetpcap.jar into your classpath; it has to interact with the native pcap installation. See the install instructions at jnetpcap.com. There are installer files for Windows XP/Vista (and I think 7), and for linux (debian and ubuntu for sure).

As a demo file, I am giving you reassemblerDemo.java. This file simply prints out each TCP packet in turn. Note that sequence numbers are not printed; they wouldn't make much sense unless the appropriate ISN were subtracted.

New April 19: I added a .equals() method to the Psocket inner class in the above file. I also gave you a PSocketPair class, also with .equals, that could serve as the "key" type when for each packet you need to look up its TCP connection information. I apologize for the inconsistent capitalization between Psocket and PSocketPair.

Here is a (hopefully steadily growing) list of tcpdump packet-trace files.

In each file, each connection involves data flow in only one direction.

If reassemblerDemo.java fails to compile, the most likely cause is that jnetpcap.jar is not in your classpath; if it fails to run, one possibility is that the jnetpcap package was not installed properly.