Calculating…  Calculating…  DOCSIS Codeword Errors – You will likely see them

DOCSIS correctable codeword errors and uncorrectable codeword errors are available as a metric in most cable operator monitoring tools and Operations Support Systems (OSS).  The monitoring tools and OSS generally provide data and dashboards for users to monitor overall network health and then identify a problem when it occurs.  Metrics like codeword errors are extracted directly from DOCSIS CMTSs using Simple Network Management Protocol (SNMP) or Internet Protocol Detail Records (IPDR), the later being preferred due to its reduced CMTS CPU utilization.   What many people struggle with are what are those codeword errors, what do they mean, and how bad is bad?  This artical will answer those questions and discuss Reed Solomon forward error correction including how it corrects bit errors in the DOCSIS network.

Where Codeword Errors Come From

When someone is using a device, such a PC or iPad, and transmits data over a DOCSIS cable modem that larger than 16 bytes of data, an error protection algorithm kicks in called Reed-Solomon (RS) forward error correction (FEC).  There are two parts to RS FEC, one is the encoder, which is in the cable modem and the second is the decoder, which is in the CMTS.  As an example, the RS encoder creates an 18 byte codeword from an original 16 bytes of data transmitted by you, the user.  The two extra bytes of data are added for error correction.  The 16 bytes of data plus the 2 bytes of FEC is called a codeword.  Great, so we have two bytes of data for error correction added and we know the origin of the mysterious codeword, but now what?  How does two bytes of data help us and where do correctable and uncorrectable codewords come from? First, lets understand how the 18 byte codeword was initially formed in the DOCSIS cable modem so that we better understand how the two error correction bytes will help us when we need them.  The original 16 bytes are inserted into a matrix as shown below.  Each byte is represented by B1 for byte #1, B2 for byte #2, you get the picture.  Next, each row in the matrix gets some complex math applied to it using something called a polynomial in order to create a value for each row listed, such as y1.  Similarly, each column gets complex math applied to create values such as that displayed by x1 (using that same polynomial).  Those are not the actual values, but I don’t want to put you to sleep with algebra.

Reed Solomon Encoded Data

Finally, the horizontal math values and vertical math values are also put into one final polynomial to create the 2-byte word which is called the parity-bytes in DOCSIS terminology (shown in the red box).  If you are interested in the math behind calculating the parity-bytes, check out the DOCSIS standard.  Further, I have used a very basic example of 16-bytes for illustration purposes.  DOCSIS defines an algorithm by which data packets larger than 16-bytes can be accommodated.

Now that we understand how the RS codeword was formed, we can discuss how errors occur.  As the codeword is passing through the HFC network as an RF signal it is likely to experience RF impairments.  Let’s say that this particular codeword experiences a brief moment of laser clipping which only impact one byte of the matrix.  Shown below is the impacted byte (in red).  Here is where the Reed Solomon algorithm in the CMTS works its magic.

Errored Reed Solomon Frame

Byte B7 is displayed as red, indicating it was the cell that received errors.  The RS algorithm was able to determine this using the inverse equations used to create the parity-bytes.  First it extracts the vertical and horizontal values and displays their x & y values.  Next it calculates the vertical and horizontal values based upon the received data.  But what happens is that the received data has a corrupt cell at B7, so its vertical and horizontal values do not match up with those that were created from the parity-bytes.  This could be a correctable or uncorrectable codeword error, but we don’t yet know.

How Many Errors Can FEC Correct?

So far I have mainly been talking about FEC in terms of bytes because this is how DOCSIS refers to FEC granularity.  But RS FEC matrices like the two I illustrated above use bits.  There are eight bits in a byte.  A bit is also the smallest value in terms of logic because it is either a one (1) or a zero (0).  This is quite helpful in the previous example where we had an error that was detected by RS.  If RS detects an error in a matrix cell all it has to do is change the value.  What do I mean by this?  If there is a one (1) in the cell and the parity-byte says that cell is wrong, then by changing it to a zero (0) and re-running the algorithm and checking it against the parity-byte again it will be correct.  That is how correctable codewords work!  The RS algorithm sees an error, flips the value, checks it again proclaims “hey, we had an error but we fixed it, so just keep on going and call it a correctable codeword.”  Okay, maybe I’m embellishing for the CMTS, but it deserves some credit. There is a point where too many errors cannot be corrected.  This point occurs at different times for different modulation rates, but based upon the Reed-Solomon algorithm can be generalized using the following equation (this is really basic, so don’t panic):

t<\frac{q-k+1}{2}

where t = number of correctible bits, q=codeword length (18 bytes in this example), k = data length (16 bytes in this example).  Now if we just plug the values into the above equation, we can find out how many bits of data that RS will be able to correct for this codeword.

t<\frac{18B-16B+1}{2} = \frac{2B+1}{2}

Since 1 byte (B) = 8 bits:

t<\frac{2*8+1}{2} = \frac{17}{2}

t< 8.5 bits = 8 bits

Therefore, in the example used here, RS should typically be able to correct about 8-bits or 1-byte worth of data.  This means the error created back in our example would have been a correctable codeword!  When RS exceeds its ability to correct data, say t = 9 bits, then it will report an uncorrectable codeword and the CMTS will delete the codeword.  This means the packet will need to be retransmitted if it is non-realtime data, but if it is voice or video traffic then its gone for good. Uncorrectable codeword errors in a DOCSIS network is a sure sign that a subscriber somewhere on your network is losing data.  If you have large amounts of uncorrectable codeword errors then you can speculate that you have a dissatisfied customer, but don’t speculate that you have issues in your network because it is a fact that you do.

Codeword Errors on a CMTS

A lot of users or even SNMP-based systems will look at codewords and correctable / uncorrectable codewords directly from a CMTS.  The commands are pretty straight forward and look like this:

volpefirm# show interface c1/0 upstream
Cable1/0: Upstream 0 is up
     Received 2112 broadcasts, 4045 multicasts, 3398631 unicasts
     0 discards, 3166 errors, 2581 unknown protocol
     3404788 packets input
     Codewords: 13861878 good 760804 corrected 47090 uncorrectable
     5 noise, 0 microreflections
     Total Modems On This Upstream Channel : 7 (7 active)

On this upstream port you can see there are a lot of corrected and uncorrectable codeword errors, but how do you determine if this is good or bad?  First you want to convert the correctable and uncorrectable codewords into a percentage based on the total number of codewords.  This is easy to do using the following formula:

Uncorrectable Codeword Errors %=((Number of Uncorrectable Codeword Errors )/Total Number of Codewords)×100

It is important to read the details in the management information base (MIB) objects for SC-QAM codewords.

The description for docsIf3CmtsCmUsStatusUnerroreds is as follows: This attribute represents the codewords received without error from the CM on this interface. Discontinuities in the value of this counter can occur at re-initialization of the managed system, and at other times as indicated by the value of ifCounterDiscontinuityTime for the associated upstream channel.

The underlined section clarifies that in the MIB object for total codewords, the attribute named “docsIf3CmtsCmUsStatusUnerroreds” specifically counts only those codewords that are error-free. This indicates that both correctable and uncorrectable codewords are not included in what is defined as the “Total Number of Codewords” by this attribute. It’s crucial to understand that for accurate percentage calculations, the “Total Number of Codewords” in the denominator of our formula should encompass all codewords—this includes the total codewords, along with both correctable and uncorrectable ones, even though this comprehensive inclusion is not explicitly specified.

Here is the complete version of the SC-QAM formula for uncorrectable codeword errors, with the added clarity in the denominator:

Uncorrectable Codeword Errors %=((uncorrectable codewords )/(total codewords + correctable codewords + uncorrectable codewords ))×100

While this appears to be a minor change, it can have a significant impact on the percentage error, depending on the variables in the denominator of the equation.

The same formula applies for correctable codeword errors.  While a correctable error rate of over 5% seems high, it is important to keep in mind that the CMTS is fixing these errors.  The uncorrectable errors are something to be more concerned about because these errors result in lost data.  For voice-over-IP (VoIP), the recommended best practice is not to exceed 1% uncorrectable codeword errors.  In addition, one thing that you will want to look for are trending patterns, such as increasing correctable and uncorrectable codeword errors over an extended period of time such as days, weeks or months.  This is an omen of certain bad things to come in your DOCSIS network. This should have provided a good starting point for those not familiar with Reed Solomon error correction and codeword errors.  In a future post I’ll correlate how codeword errors an bit error rate can be related.  This is especially valuable for those who use test equipment to help resolve field issues, but only have pre- and post-FEC measurements to rely on. Upcoming events can be seen under Broadband Events. Previous events can be seen under the blog.

  • If you are watching this on youtube please hit the subscribe button!
  • Let us know what you think and remember to share!  
  • You can find slides at the bottom of the page and some on slideshare.  
  • Find out about events or articles by following us on TwitterLinkedIn or Facebook too.

Also available on iTunes, Google Podcasts, Spotifyvurbl see podcasts “get your tech on”. Spotify  Vurbl