All VoIP solutions contend with packet
switched networks that were not designed to transmit real-time media
streams. Packet latency and packet loss are the principal manifestations
of this reality, and the jitter
buffer is their canonical solution.
They then describe an adaptive-jitter-buffer algorithm, where the buffer can
change size. That's a bit of a trick, as that means slowing down or speeding
up the audio. In order to do this at all, one must know that
sometimes voice can be stretched (eg by pausing), or shortened. For voice,
the resultant distortion is generally acceptable; for music, it is likely
not.