3000 Old Alabama Road Suite 119-434, Alpharetta, GA 30022-8555404-424-8202info@volpefirm.com

Troubleshooting DOCSIS – VoIP Impairments > Delay & Jitter

Post 185 of 193

In this blog I will address delay and jitter as they pertain to VoIP in a DOCSIS network.  Delay, jitter and packet loss are the three primary impairment in a VoIP network, but packet loss was addressed in my Troubleshooting DOCSIS - VoIP Impairments > Packet Loss blog.

After packet loss, delay is the second most disruptive impairment in VoIP networks.  The effects of delay to the caller generally appear as echo and talker overlap.
In PSTN communications, echo can arise as acoustic echo between the mouthpiece and earpiece in the handset or as electrical reflections due to impedance mismatches at the hybrid circuit, called hybrid echo.  The graph below shows that Delay needs to be kept under 150 msec or subscribers will notice it.  At delay levels of 200 msec and greater, delay will start to make phone conversations very uncomfortable, similar to those you have seen between two CNN reporters with correspondents in foreign countries over satellite connections.


In VoIP networks, PSTN echo effects can be present if the VoIP network is terminated to the PSTN.  This is common in today’s VoIP deployments since cable-based VoIP is still in its infancy and usually relies on the PSTN as either a termination to the other caller or as a bridge between two callers.  VoIP calls will also experience IP related echo due to the delay associated with the reflected hybrid echo on the PSTN end.  In strictly PSTN calls, the hybrid delay is often small enough that echo cancellation is not required.  However in VoIP networks, the hybrid echo will be transmitted back to the caller through the IP networks inherent delays (described below).  The sum of these delays will always exceed 32 ms since this is the Multi-Media Termination Adapter (MTA) device base processing time; therefore echo cancellation is always required in VoIP networks.  (Remember, the MTA or embedded-MTA with the cable modem installed, is the device that converts voice to data so VoIP works over DOCSIS - yes, I need to write a blog on this!).

Talker overlap occurs when the end-to-end delay between a packet transmission and reception is so great that one caller cuts off the speech of another caller due to excessive delay.  ITU-T G.114 provides the following guidelines (see table below) for call quality given a certain amount of delay.

call acceptance delay

Sources of delay can be broken down into five (5) categories, some of which have constant, known delay and some of which have variable, time dependent delay.

Algorithmic Delay

Algorithmic delay is the time associated with processing a voice signal from analog to its coded equivalent.  This delay is constant for a given CODEC, but implementations of the CODEC such as PLC or forward error correction may increase the delay.  The following table lists the delays associated with G.711 and G.729.

delay by codec

This table makes it clear how the use of PLC or compression for the purpose of minimizing Packet Loss has a potentially significant trade off in packet delay.  These trade offs must now be considered when balancing packet loss for delay in a VoIP link.

Packetization Delay

Voice packets are accumulated in 20 ms periods in order to optimize the transport of voice traffic on a data network.  The accumulation of 20 ms of voice traffic before transmission translates into a minimum of 20 ms of delay.  If it is desirable to transmit fewer packets to reduce network congestion, say 40 ms packets instead of 20 ms packets, then this network traffic optimization translates directly into an increased delay impairment of at least 40 ms.  Depending upon the total system delay, this may be acceptable, but the trade-off between decreased network traffic and increased packetization delay must be carefully weighed.

Serialization Delay

Serialization delay is simply the rate at which voice packets can be transmitted on an IP network.  This is a direct function of the capacity of a network.  In a DOCSIS network, individual users are granted a limited upstream transmission bandwidth since the upstream is a shared medium with a finite throughput.  For example, assuming a granted 114 kbps upstream bandwidth for VoIP and a 232-byte VoIP frame, the serialization delay would be: 232-bytes x 8 bits/byte x 1 sec/114 kbits = 16.28 msec .

Propagation Delay

Propagation delay is the time for an electrical or optical signal to travel along a transmission medium.  The DOCSIS specification allows a maximum distance from the cable modem to the CMTS of 100 miles.  This distance will usually comprise a mixture of optical fiber and coaxial cable.  Since the exact ratio of fiber-to-coax may not be known, the DOCSIS specification simply defines a maximum acceptable transit time of 0.8 ms  for data communications, assuming the 100 mile limit.  Therefore, the maximum HFC propagation delay is 0.8 ms, but one must also take into consideration the network cabling after the CMTS as well as the propagation time in the PSTN network for total propagation delay time.

Component Delay

Component delay is the final delay type in a data network.  This delay is highly variable, dependent upon the type and vendor of a network devices, such as routers, switches, hubs, and CMTS devices.  Conventional data networks have had many years to optimize and minimize the delay associated with network devices.  DOCSIS, on the other hand, is in its infancy with respect to delay optimization.  Initially, DOCSIS 1.0 only had provisions for best effort service.  This implies that when many cable modems are competing for bandwidth, the CMTS will provide bandwidth to the CMs on a first come, first served basis.  DOCSIS 1.1 and future specification revision (up through DOCSIS 3.0) enabled a number of features to provide the operator with the capability to minimize delay through priority service offerings.  While this process is not altogether automated, having knowledge of the underlying principals of DOCSIS communication will aid in understanding the optimization techniques.

When a cable modem has data to send, it must make a grant request (REQ) to the CMTS.  The REQ contains the CM’s SID and the amount of time needed to transmit the data.  The CMTS will allocate time to the CM via a MAP message, which contains the SID of the CM and the amount of time the CM is allowed to transmit.  During times of low traffic, all REQs are filled by the CMTS as soon as they are received.  Once the MAP is received by the CMTS the CM may transmit its data as defined by the MAP.  Because a CM must send a REQ and receive a MAP before it can actually transmit data, this adds significant delay to each voice packet sent.

PacketCable™, a set of protocols to deliver quality of service within the DOCSIS network, allows for Unsolicited Grant Service (UGS) for Constant Bit Rate (CBR) type traffic associated with VoIP calls.   During a VoIP call setup in a DOCSIS network, the CMTS will provide a secondary SID (as defined by PacketCable) to the eMTA and a CBR packet length, say 232-bytes.  Once the call has been established, the eMTA can transmit its 232-byte packets in the pre-allocated time frame associated with the secondary SID without making a REQ for data.  This implies that the eMTA has a data grant period (MAP) pre-allocated for the voice traffic.  In the implementation of UGS, voice packets will now only experience the transit delay of the packet and not the additional transit delay from REQ-MAP process.

Now that the REQ-MAP dilemma has been mitigated through UGS, the only other component delays are associated with the eMTA processing time (32 ms) and CMTS routing time.

Jitter (Delay Variation)

Jitter is the final VoIP impairment to examine, although it is often considered a subset of delay.  Jitter is the delay variation of packet arrival between consecutive packets.  Although the eMTA will generate a constant rate of packets, one every 20 ms, the CMTS and subsequent routers and gateways may be unable to process the packets in real-time due to network loading.  This means that buffers must be employed by the CMTS and routing devices to temporarily store a packet while it is processing other traffic.  The less traffic present, the faster the VoIP packet can be processed.  Jitter will therefore result in the clumping and gaps of the incoming voice stream.  Jitter is expressed mathematically as the sum of delays, Di, over a specified period n.


Since jitter is calculated as the magnitude of the delay variation, it will always be a positive number, with zero indicating that no jitter is present.

The generalized way to minimize jitter is to use a buffer that will hold all incoming frames for a period of time so that the slowest frames arrive in time to be played in the correct sequence.  The jitter buffer will add to the overall delay of the network and so once jitter exceeds a certain level, the jitter buffer will begin to impair the call through excessive delay.  Adaptive jitter buffers are usually employed in managed VoIP networks.  These adaptive buffers increase in size only as needed when the jitter increases.  Managed adaptive buffers will intentionally drop packets in order to maintain low enough delay to facilitate an acceptable level of call performance.  Again a tradeoff must be made between packet loss and jitter compensation and weighed against the effects of R-factor and/or MOS score.

Once again I may have created as many questions as I have answered about DOCSIS and VoIP in this blog.  It is clear that it will take a few more blogs to address much of the alphabet soup in the world of DOCSIS and VoIP, so please stop back for my next post.

Mr. Volpe has over 25 years of communications industry experience. He is focused on the cable and telecom industry with deep technical and business skills. Mr. Volpe is currently the president and chief technologist of the Volpe Firm and holds an MSEE with honors.

Twitter LinkedIn Google+ 

, , , , , , , , , , , , , , , , , ,