In my next couple of posts I am going to digress from my standard DOCSIS 101 tutorial and spend a little time on DOCSIS troubleshooting basics, especially with respect to Voice-over-Internet Protocol (VoIP) – VoIP impairments.  I am doing this as due to many offline questions specific to this topic. Don’t worry, though I will not abandon the DOCSIS tutorial.

I will begin with the method by which cable operators provide voice services over a data network.  This will start with a VoIP primer in order to provide a common foundation of terminology and understanding of the subject matter.  Once a foundation has been provided, the blogs will focus on three (3) high level VoIP impairments, Packet Loss, Delay (also referred to as “Latency” interchangeably), and Jitter.  These three components are responsible for nearly all VoIP call degradation, but the underlying manifestation of the components may be the result of a complex number of factors in the distribution network.  While some of the impairments can be corrected in the transport mechanisms supporting the VoIP call, some cannot.  This leads to the need for advanced techniques of minimizing the effects of packet loss, delay, and jitter.  If the impairments cannot be minimized below a certain threshold, the user (hereon referred to as the “caller”) will experience poor voice call quality, analogous to many Public Switched Telephone Network (PSTN) related impairments, such as echo, noisy background, distorted voices, and talker delay.  Minimizing these perceived impairments is what will ultimately win or lose the battle for voice services by cable providers.

VoIP Primer

Packet voice systems, those systems which enable VoIP, accept analog voice signals from telephone handsets or PSTN networks.  The analog signal is first digitized and coded, typically using the ITU-T G.711 PCM standard or the ITU-T G.729 standard, which offers compression.  The decision to use a compression-less CODEC (coder-decoder) such as G.711 or compression-based CODEC such as G.729 is usually a decision left to the system operator.  Signals using compression will create less network traffic, but some call quality will be lost.  Non-compressed signal will have the greatest call quality, but only if it does not over-burden the data network with traffic, causing degraded call quality.

Now that the analog speech has been digitized, possibly compressed, and encoded, it is injected into IP based networks (usually Ethernet) as data traffic.  In order to co-exist with the other traffic on an IP network, the voice stream must be broken into small packets.  Generally voice streams are broken into 20 ms long packets.  This time interval is often chosen so that a loss of one packet will have an un-perceivable impact on the voice call.  The 20 ms packet size is also optimal for CMTS utilization, which creates 50 packets per second (pps) of voice traffic.  Smaller packetization sizes of 10 ms create 100 pps, which unnecessarily loads the CMTS, causing excessive CMTS CPU utilization during heavy call times.  Additionally, 20 ms packetization keeps the packet size relatively small, as compared to 30 ms or 40 ms packetization times.  In the case of G.711 with a packetization period of 20 ms, the payload will be 160 bytes.  This will result in a total DOCSIS frame length of 232 bytes, which includes the necessary overhead to properly route the packet in the DOCSIS data network.  [Although the term “frame” is the appropriate name for a packet of data with the associated overhead bytes for network routing, this paper will generally refer to frames as packets, because it is the payload portion of the frame that is under consideration.]

As will be discussed in more detail later, packets associated with voice services will usually be assigned a priority over non-time sensitive traffic, such as email or web-based traffic, in order to assure a higher quality of service (QoS).  Since the voice call is now in a format compatible with Ethernet, it is able to be transported over an HFC network using the DOCSIS transport specification.  This enables users with cable modems and Multi-media Terminal Adapters (MTAs) in their homes and businesses to access data and voice services over the same network, providing the infrastructure is in place.  [Note:  Typically cable modems and MTAs are packaged as a single unit, called an eMTA for embedded MTA.  This implies that the MTA has a resident cable modem on board.]

VoIP Impairments Call Quality

In contrast to standard data communications, where communication quality is quantified in objective terms of Bit Error Rate (BER), call quality is a subjective measurement.  The subjectivity of call quality is largely due to the fact that the human ear and brain can automatically correct for a certain level of impairments during a conversation.  So even though BER may be poor, the ear-brain combination may enable the caller to consider a call as high quality because the impairments are relatively imperceptible, whereas a computer may reject packets with even one error which is not corrected by employed error correction codes.

This section will discuss the following two primary methods for quantifying call quality, Mean Opinion Score and E Model.  The first is Mean Opinion Score (MOS), whereby a panel of “listeners” subjectively rate a given voice call on a scale of one to five.  The mean of the listener’s ratings is then used as the decisive scale for call quality.  MOS is the mostly widely used voice quality metric.

The second method is the E Model.  The E model is defined in the ITU-T G.107 standard as an R factor with a metric from 0-100.  The basic formula for determining the R factor is:

R = Ro – Is – Id – Ie + A

Where Ro is a base factor determined from noise levels, loudness, etc.  Is represents impairments occurring simultaneously with speech, Id represents impairments that are delayed with respect to speech, Ie represents the so-called “equipment impairment factor” and A is the “advantage factor” of using that particular voice service.   Assuming an “ideal” network, the maximum obtainable R factor for a G.711 encoder is 93 while a G.729 encoder is 83.  The following table shows the “user opinion” as a function of the computed R factor and tested MOS.

In my next post I will explain the individual impairments of Packet Loss, Delay , and Jitter to help you understand how these impact call quality.

Upcoming events can be seen under Broadband Events. Previous events can be seen under the blog.

  • If you are watching this on youtube please hit the subscribe button!
  • Let us know what you think and remember to share!  
  • You can find slides at the bottom of the page and some on slideshare.  
  • Find out about events or articles by following us on TwitterLinkedIn or Facebook too.

Also available on iTunes, Google Podcasts, Spotifyvurbl see podcasts “get your tech on”.

Spotify  Vurbl