Comp 346/488: Intro to Telecommunications

Tuesdays 4:15-6:45, Lewis Towers 412

Class 9, Mar 20

Reading (7th -- 9th editions)


Topics:



SDP

The Session Description Protocol has to negotiate the connection. At the SIP level, this could be point-to-point audio, multicast audio, video, or even something else.

For VoIP purposes, the main role of SDP is to negotiate the codec, or voice-data-encoding algorithm, used by the connection, and also the media path. Here are some options, from the O'Reilly  Asterisk-The Definitive Guide book, Appendix B: Protocols for VoIP:

Codec Data bitrate (Kbps) License required?
G.711 64 Kbps No
G.726 16, 24, 32, or 40 Kbps No
G.729A 8 Kbps Yes (no for pass-through)
GSM 13 Kbps No
iLBC 13.3 Kbps (30-ms frames) or 15.2 Kbps (20-ms frames) No
Speex Variable (between 2.15 and 22.4 Kbps) No
G.722 64 Kbps No

G.711 is, of course, µlaw. There is no compression beyond companding.

G.729 is a remarkably efficient form of compression, though it is CPU-intensive. The latter matters only if you're doing the compression/decompression on a shared switch; if your phone can do G.729 itself then this is not an issue.

The compression algorithms used by G.729A are tuned to voice; so much so, in fact, that DTMF (touch-tone) tones are not properly carried! The process starts with a 10 ms block of 16-bit samples (80 samples). The patented algorithm is known as Algebraic Code-Excited Linear Prediction (ACELP). Each block is run through two digital filters, one to create an average pitch for the block and one called the "stochastic" contribution. The latter is encoded as an entry in a large "codebook" that is built into the algorithm. The codebook (and the average-pitch phase) attempts to use a model of how human-speech sounds are produced.

Speex also uses generic CELP.

Demo of G.729


VoIP and Jitter Buffer

VoIP calls have to deal with a much larger round-trip time than PSTN calls, easily 100-200 ms versus the PSTN's delay of 1-2 ms on a DS-0 line.

However, the delay is often chosen to be larger than the initially measured RTT. This is to have a hedge against increases in the RTT; the "voice RTT" should be larger than the worst-case "packet RTT".

Diagram?

Sometimes, if we realize the voice-RTT estimate was chosen too large, we can reduce it, either by paring away a few voice samples at a time, or by reducing times when the line is silent (we need some form of "silence detection" for this).


RTP & RTCP

RTP is Realtime Transport Protocol, a generic UDP-based way of sending "realtime" (audio and video) data. The RTP header contains:
Examples of CSRCs might be the separate video and audio feeds, or synchronized video feeds from multiple locations or cameras, or synchronized feeds from multiple microphones.

RTP does not use a designated port. (Nor is there an evident "signature" for RTP packets, making them sometimes hard to identify). In the SIP/SDH packets immediately before the exchange,

In the Asterisk rtp.conf file, I specified the RTP port range as 19000-20000. These are the local ports used by ulam2; the other end of course chooses its local port.

In my wireshark file g729.outbound.frontier.pcap (ie to Frontier Telecommunications, my home landline provider), I placed a call from a cisphone directly connected to ulam2 via a private network tunnel. Ulam2 chooses port 19522 and the remote RTP end, 67.16.104.172 (an IP address reasonably near my home), chooses port 50836. Ulam2 specifieds this port 19522 in both SIP/SDP packets it sends (1 and 5). Packet 7 is the one SIP/SDH packet from flowroute (216.115.69.144), and the SDH body identifies its RTP host (67.16.104.172, in "Connection Information => Connection Address") and port (50836, in "Media Description => Media Port"; Media Format of G.729 is also identified here). The very next packet (the first RTP packet) is sent by ulam2 to 67.16.104.172 / 50836.

Note that ulam2 does not specify a different host; that is, there is no "reinvite" at its end. That's because I told it not to ("reinvite=no"), which in turn is because the actual phone is behind a firewall; ulam2 continues to forward RTP packets between the cisphone and 67.16.104.172. IP addresses and ports embedded in the communications stream are particularly difficult for NAT firewalls to handle, as the behind-NAT sender's idea of what its IP address and port are will have almost no relation to the actual IP address and port. (For SIP, we might actually get the same port, 5060, as long as we're the first to ask for it.) It is actually the SDP packets, carried within SIP packets, that hold the media-stream contact information. In Asterisk, the sip.conf option "nat=yes" causes any new IP address or port in the SDP packets to be ignored; the media stream is sent to the same host as the SIP packets.

In my wireshark file g729.outbound.loyola.pcap, I called my Loyola office phone from my office cisphone, which reaches ulam2 through a NAT firewall (10.38.2.42). I traced not only the RTP packets between ulam2 and the other end, but also the RTP packets to the cisphone. We see the following:

packet 1: SIP/SDP INVITE from firewall to ulam2, containing a Connection Address of 10.213.119.31. This is a behind-the-NAT address; ulam2 can not reach it. There is also a port specified (16472), and multiple media formats (G.729, G.711, G.721, G.722, ....).

packet 4: Pretty much the same, after some authentication

packet 6: SIP INVITE from ulam2 to sip.flowroute.com (216.115.69.144), specifying Connection Address = ulam2 (ie not 10.213.119.31 or 10.38.2.42), and port 19116; G.729 is the only codec offered

packet 10: Again, pretty much the same, but after authentication

packets 13-15: These look like early RTP packets from the far end. At this point ulam2 has no idea who they are from!

packet 16: This is the first SIP/SDP packet from sip.flowroute.com to ulam2, identifying the remote end as 68.68.120.43 / 9998, identified by ip2location.com as being in Morgantown, WV.

packet 17: SIP/SDP from ulam2 to the firewall, specifying the connection as 147.126.65.47 / 19098; that is, ulam2 again.

The rest is the RTP stream, including both the local stream between ulam2 and the cisphone/firewall and the longhaul stream from ulam2 to West Virginia.
Note packets 19, 21, 23 and 25, which ulam2 attempted to send directly to the cisphone at hidden address 10.213.119.31. They never got there. After that, we see each packet twice, once between ulam2 and 68.68.120.43 and once between ulam2 and 10.38.2.42. The packets to 10.213.119.31 stop as soon as the first RTP packet from the cisphone, via the firewall, arrives at ulam2 as packet 26. At this point ulam2 recognizes the packet as part of the RTP flow it is expecting from cisphone, and uses the packet's source address (the firewall) as the address to send future RTP packets to, rather than the IP address of 10.213.119.31 announced in packets 1 and 4.

To summarize: the SIP protocol as such uses port 5060, and gets through a firewall by
RTP  relies either on (a) not trying to get through NAT firewalls at all, or noticing that the real address/port from which RTP packets are arriving from behind the firewall is not the "advertised" address/port, and using the real address/port instead. There is also a manual configuration option for NAT exceptions, and there are also "NAT-traversal" protocols such as SOCKS or NAT-T that allow certain ports behind a NAT firewall to be "opened up".

The switch to this new IP address / port is in Asterisk called a "reinvite"; the SIP protocol uses this term for modifications of an existing negotiated media stream.

With cisphone behind a NAT firewall, ulam2 can reach it by replying back. What if ulam2 itself were behind a firewall? Then it would REGISTER itself to sip.flowroute.com and continue sending keepalives to sip.flowroute.com to stay in touch. Then flowroute could reply to ulam2, at least to port 5060. There is a mechanism for manual configuration (and again keepalives) of a handful of ports for the RTP traffic, but I'm not very familiar with it.



We earlier looked at some of the RTP packets from a call placed by the cisco VoIP phone using G.711. These packets were 214 bytes, consisting of
Because the RTP header was only 12 bytes, there were no CSRCs, which is what we would expect for a single voice channel. The RTP payload type was 0x00, which is desigated as ITU-T G.711 PCMU, which is µ-law/A-law logarithmic companding of 8-bit PCM audio with a sampling rate of 8000.

Other payload-type codes may be found in RFC 3551, page 33 (RTP A/V Profile).

Compare the RTP packets for a G.729 connection. The packets still have the 14-byte Ethernet header, 20-byte IP header, 8-byte UDP header and 12-byte RTP header. However, the data for 20 ms now takes only 20 bytes, for a total packet size of 74. Note that, for these packets, these headers now take up 73% of the total!

RTP timestamps are provided by the sending application, and are application-specific. Their primary purpose is to synchronize playback. For the cisco phone, the timestamps were in multiples of 160, starting at 160; these would represent the number of "sampling ticks" (at the rate of 8000/sec) up to the end of the current data.

RTP packets are generated only by the sender; the flow is one-way. The RTP protocol does not include any form of acknowledgement. However, associated with RTP is the Realtime Transport Control Protocol (sometimes called RTP Control Protocol), or RTCP. This has several goals, in each direction; one goal is to support tagging of RTP streams, and to support coordination and mixing of RTP streams that have not been mixed ath the SSRC/CSRC level. However, for our purposes the primary goal of RTCP is to provide acknowledgment-like feedback from receiver to sender, indicating the packet loss rate. Here is that portion of the data from one of the cisco RTCP Receiver Report (RR) packets:
RTCP packets are not sent often; in the g729.outbound.loyola.pcap file there is a pair at positions 1006 1nd 1007 and the next pair at 2010/2011.

RTP applications can elect to receive this information (in particular, the fraction lost), and adjust their sending rates (eg by choosing a different encoding) appropriately.

There is a Java interface to RTP, java.net.rtp (not part of the standard Oracle/Sun library). In this package, an application that wishes to receive RTCP information can create an RTCP_actionListener, which is invoked asynchronously whenever RTCP packets arrive, much like a mouse Listener. A standard GUI interface, however, may consist of essentially nothing but various asynchronous Listeners. A typical RTP program, on the other hand, will have at least one main thread involved in the transfer of data, and which will receive some form of messages or notification (perhaps by setting shared variables) from the RTCP_actionListener.

If you watch a movie on Netflix or Hulu, the video encoding is highly adaptive: if bandwidth is reduced, the video encoder is changed to a lower rate. Originally, Netflix transmitted at one of 2200 kbps, 1600 kbps, 1000 kbps and 500 kbps (with progressively poorer resolution as the rate decreased); your rate might change mid-session as less (or more) bandwidth was available. I'm not sure if Netflix uses RTP, but, if it does, then it is the returning RTCP packets that provide input to the video transmitter at the Netflix end as to what encoding rate to use. Netflix has now licensed the eyeIO technology for variable-rate video encoding.