DOCSIS correctable codeword errors and uncorrectable codeword errors are available as a metric in most cable operator monitoring tools and Operations Support Systems (OSS). The monitoring tools and OSS generally provide data and dashboards for users to monitor overall network health and then identify a problem when it occurs. Metrics like codeword errors are extracted directly from DOCSIS CMTSs using Simple Network Management Protocol (SNMP) or Internet Protocol Detail Records (IPDR), the later being preferred due to its reduced CMTS CPU utilization. What many people struggle with are what are those codeword errors, what do they mean, and how bad is bad? This artical will answer those questions and discuss Reed Solomon forward error correction including how it corrects bit errors in the DOCSIS network.

When someone is using a device, such a PC or iPad, and transmits data over a DOCSIS cable modem that larger than 16 bytes of data, an error protection algorithm kicks in called Reed-Solomon (RS) forward error correction (FEC). There are two parts to RS FEC, one is the encoder, which is in the cable modem and the second is the decoder, which is in the CMTS. As an example, the RS encoder creates an 18 byte codeword from an original 16 bytes of data transmitted by you, the user. The two extra bytes of data are added for error correction. The 16 bytes of data plus the 2 bytes of FEC is called a codeword. Great, so we have two bytes of data for error correction added and we know the origin of the mysterious codeword, but now what? How does two bytes of data help us and where do correctable and uncorrectable codewords come from?

First, lets understand how the 18 byte codeword was initially formed in the DOCSIS cable modem so that we better understand how the two error correction bytes will help us when we need them. The original 16 bytes are inserted into a matrix as shown below. Each byte is represented by **B1** for byte #1, **B2** for byte #2, you get the picture. Next, each row in the matrix gets some complex math applied to it using something called a polynomial in order to create a value for each row listed, such as **y1**. Similarly, each column gets complex math applied to create values such as that displayed by **x1 **(using that same polynomial). Those are not the actual values, but I don't want to put you to sleep with algebra.

Finally, the horizontal math values and vertical math values are also put into one final polynomial to create the 2-byte word which is called the parity-bytes in DOCSIS terminology (shown in the red box). If you are interested in the math behind calculating the parity-bytes, check out the DOCSIS standard. Further, I have used a very basic example of 16-bytes for illustration purposes. DOCSIS defines an algorithm by which data packets larger than 16-bytes can be accommodated.

Now that we understand how the RS codeword was formed, we can discuss how errors occur. As the codeword is passing through the HFC network as an RF signal it is likely to experience RF impairments. Let's say that this particular codeword experiences a brief moment of laser clipping which only impact one byte of the matrix. Shown below is the impacted byte (in red). Here is where the Reed Solomon algorithm in the CMTS works its magic.

Byte B7 is displayed as red, indicating it was the cell that received errors. The RS algorithm was able to determine this using the inverse equations used to create the parity-bytes. First it extracts the vertical and horizontal values and displays their x & y values. Next it calculates the vertical and horizontal values based upon the received data. But what happens is that the received data has a corrupt cell at B7, so its vertical and horizontal values do not match up with those that were created from the parity-bytes. This could be a correctable or uncorrectable codeword error, but we don't yet know.

So far I have mainly been talking about FEC in terms of bytes because this is how DOCSIS refers to FEC granularity. But RS FEC matrices like the two I illustrated above use bits. There are eight bits in a byte. A bit is also the smallest value in terms of logic because it is either a one (1) or a zero (0). This is quite helpful in the previous example where we had an error that was detected by RS. If RS detects an error in a matrix cell all it has to do is change the value. What do I mean by this? If there is a one (1) in the cell and the parity-byte says that cell is wrong, then by changing it to a zero (0) and re-running the algorithm and checking it against the parity-byte again it will be correct. That is how correctable codewords work! The RS algorithm sees an error, flips the value, checks it again proclaims "hey, we had an error but we fixed it, so just keep on going and call it a correctable codeword." Okay, maybe I'm embellishing for the CMTS, but it deserves some credit.

There is a point where too many errors cannot be corrected. This point occurs at different times for different modulation rates, but based upon the Reed-Solomon algorithm can be generalized using the following equation (this is really basic, so don't panic):

where t = number of correctible bits, q=codeword length (18 bytes in this example), k = data length (16 bytes in this example). Now if we just plug the values into the above equation, we can find out how many bits of data that RS will be able to correct for this codeword.

Since 1 byte (B) = 8 bits:

Therefore, in the example used here, RS should typically be able to correct about 8-bits or 1-byte worth of data. This means the error created back in our example would have been a correctable codeword! When RS exceeds its ability to correct data, say t = 9 bits, then it will report an uncorrectable codeword and the CMTS will delete the codeword. This means the packet will need to be retransmitted if it is non-realtime data, but if it is voice or video traffic then its gone for good.

Uncorrectable codeword errors in a DOCSIS network is a sure sign that a subscriber somewhere on your network is losing data. If you have large amounts of uncorrectable codeword errors then you can speculate that you have a dissatisfied customer, but don't speculate that you have issues in your network because it is a fact that you do.

A lot of users or even SNMP-based systems will look at codewords and correctable / uncorrectable codewords directly from a CMTS. The commands are pretty straight forward and look like this:

volpefirm# show interface c1/0 upstream

Cable1/0: Upstream 0 is up

Received 2112 broadcasts, 4045 multicasts, 3398631 unicasts

0 discards, 3166 errors, 2581 unknown protocol

3404788 packets input

Codewords: 13861878 good 760804 corrected 47090 uncorrectable

5 noise, 0 microreflections

Total Modems On This Upstream Channel : 7 (7 active)

On this upstream port you can see there are a lot of corrected and uncorrectable codeword errors, but how do you determine if this is good or bad? First you want to convert the correctable and uncorrectable codewords into a percentage based on the total number of codewords. This is easy to do using the following formula:

%

%

%%

The same formula applies for uncorrectable codeword errors, which works out to be 0.34%. While a correctable error rate of over 5% seems high, it is important to keep in mind that the CMTS is fixing these errors. The uncorrectable errors are something to be more concerned about because these errors result in lost data. For voice-over-IP (VoIP), the recommended best practice is not to exceed 1% uncorrectable codeword errors. In addition, one thing that you will want to look for are trending patterns, such as increasing correctable and uncorrectable codeword errors over an extended period of time such as days, weeks or months. This is an omen of certain bad things to come in your DOCSIS network.

This should have provided a good starting point for those not familiar with Reed Solomon error correction and codeword errors. In a future post I'll correlate how codeword errors an bit error rate can be related. This is especially valuable for those who use test equipment to help resolve field issues, but only have pre- and post-FEC measurements to rely on.