Comp 443 program 2: TCP Stream Reassembler
Due: Friday, April 30
This program is for those registered for graduate credit, that is, Comp 443.
You are to write a program that reassembles the data in tcp streams. I will give you several captured-packet files in tcpdump format; you are to use the jnetpcap
library (a java wrapper library for the native pcap library) to extract
each packet, determine whether it is part of a known TCP stream, and if
so, save the data appropriately.
Note that the data portions of the TCP packets may be out of order or
overlapping; it is not sufficient in general to append the data to a
string. However, that does work in simpler cases.
You are to handle each direction of a connection separately. That is,
if you see a SYN packet from A to B (whether it is the initial SYN or
the subsequent SYN+ACK), then you record in a suitable data structure
the ⟨sourceIP,sourceport⟩, ⟨destIP,destport⟩, the ISN, and a suitable
data-buffer object. If one of the provided tcpdump packet traces
consists only of packets from A to B, and none in the reverse
direction, this will still work. If the packet-trace file is
bidirectional, your program will create two entries; the second will
have source and destination reversed (and a different ISN, of course).
For the tcpdump files I anticipate providing, it is likely that each
connection will send data in only one direction.
As later (non-SYN) TCP packets are seen, you will use the
⟨sourceIP,sourceport,destIP,destport⟩ quadruple as the lookup key to
find the corresponding ISN and data-buffer object. In theory, when you
see the FIN packet you should delete the key; in practice, I will not
reuse ports. When you've processed all the packets, you should write
out the reassembled data streams (using something like "...." to fill
in any gaps), prefixing each stream with the source and dest ⟨IP,port⟩
sockets.
As you process packets, you will build a collection of ascii
strings
(data buffers), each with an ISN-corrected starting point and a length.
You
can either reassemble as you go along, into one large reassembly buffer
(one for each connection), or else
defer reassembly to the end. Either way, the starting point attached to
each data buffer tells you where to put the data. The data buffers may
arrive out of order; if you are putting them into a reassembly buffer
as you go along then you use the starting point to decide where; if you
are deferring reassembly to the end then you can either do that or else
first sort by starting point value. It is also possible that two
buffers will overlap; that
is, buf1 starts at position 300 and has length 200; buf2 starts at
position 400 and has length 100 (or 200). In that
case, the order in which you process the data buffers may matter, but
either order is acceptable. Also, when you're
done, it is possible that there will be gaps in the data representing
packets I deleted from the tcpdump stream (they can't really be lost,
because TCP would have retransmitted the data). If there are gaps, the
simplest thing is to print those portions of the stream using "..." or
"___", one ./_ character for each missing byte. Do not worry about FIN
packets; assume that the stream ends at the last byte you have for it.
You are welcome to do this project in C, using libpcap
natively. However, it is harder to create data structures in C; in
particular, C has no general-purpose map classes for handling the
lookups.
Note that installing jnetpcap is not as simple as dropping jnetpcap.jar
into your classpath; it has to interact with the native pcap
installation. See the install instructions at jnetpcap.com. There are
installer files for Windows XP/Vista (and I think 7), and for linux
(debian and ubuntu for sure).
As a demo file, I am giving you reassemblerDemo.java.
This file simply prints out each TCP packet in turn. Note that sequence
numbers are not printed; they wouldn't make much sense unless the
appropriate ISN were subtracted.
New April 19: I added a
.equals() method to the Psocket inner class in the above file. I also
gave you a PSocketPair class, also with .equals, that could serve as
the "key" type when for each packet you need to look up its TCP
connection information. I apologize for the inconsistent capitalization
between Psocket and PSocketPair.
Here is a (hopefully steadily growing) list of tcpdump packet-trace files.
- tcpdump1.out original file; two connections, 118 packets in all.
- tcpdump2.out more connections & data, 986 packets in all.
- tcpdump2m.out same data as above, but with "mangled"
<IPaddr,port> values (that is, the <IPaddr,port> values
were remapped so the connection socketpairs differ by more than just
the source port. 986 packets. Note the IP header checksum fields will be wrong, as I modified the packets and did not update this, though this should not make a difference in that pcap/jnetpcap doesn't check this checksum.
- tcpdump2.oneway.out Same as tcpdump2.out, but all packets in the "ACK direction" have been removed. 503 packets.
- tcpdump2.reorder.out Same as oneway, but the data packets have been put into a random order. 503 packets.
- tcpdump1.overlap.out Like tcpdump1.out, but with some duplicate and overlapping packets. 119 packets in all. IP header length fields are corrected, but not TCP checksum fields.
- tcpdump1.gaps.out Like tcpdump1.out, but with one missing packet and two partial packets. 117 packets in all, corrected IP lengths.
In each file, each connection involves data flow in only one
direction.
If reassemblerDemo.java fails to compile, the most likely
cause is that jnetpcap.jar is not in your classpath; if it fails to
run, one possibility is that the jnetpcap package was not installed
properly.