Matcha Design Blog

QR Code Demystified - Part 4

Jason Brown - Tuesday, June 14, 2011

I had planned on holding off the error correction until later, but it really fits better right here. Before error correction is done, the data that we've generated must be broken down into "code words", which are just 8-bit bytes. All we have to do is take our "bit stream" from the last tutorial, and break it into 8-bit sections. Each one is represented by a number in the range 0 to 255.

The Reed-Solomon algorithm uses a Galois Field to calculate the data. Since we're dealing with 8 bits per word, we need a Galois Field of length 256. This can be calculated manually, but using a simple array is much easier. Values of αx can be found in Table 4, and the reverse can be found in Table 5.

Show/hide Table 4 - Exponents of α to Values

Show/hide Table 5 - Values to Exponents of α

These values will be important as we run the calculations required to generate the error correction data. Next, let's get into the actual process. First, we figure out how many error correction code words are in each block using Table 6. Then we cross-reference that on Table 7 to get our generator values. The comma-separated list we get as a result represents exponents of α. To see the breakdown of blocks, see Table 8 at the bottom of the page. The numbers shown are the total code word counts per block, and the number of blocks. For example, a Version 10 Error Correction L symbol has 2 blocks of 86 code words, and 2 of 87 code words. Table 6 indicates the number of those code words that are for error correction, and the remainder are for data.

Basically we start with an array of values - the values of the data codewords, followed by a number of 0 values equal to the number of error correction codewords. For example, for a Version 1 block with M Error Correction, there are 16 data words and 10 ECC words. So we start with an array of length 26, where the first 16 values are the data words, and the next 10 are set to 0.

Show/hide Table 6 - Error Correction codewords per block

Show/hide Table 7 - Generator Polynomials

Now there's a process we'll undertake a number of times equal to the number of data codewords in this block. We remove the first element from the array, we'll call it A, and find its corresponding α exponent on Table 5, which we'll refer to as B. For the next C elements of the array, where C is the length of the generator, we add the generator numbers, in turn, to B. We'll call this sum D. If D is greater than 255, divide it by 255 and use the remainder. Finally we find the value of αD on Table 4, and do a bitwise XOR against the current value of the given element of the array. Confused? Me too. Let's look at an example.

Let's say we have a Version 1 QR Code with H Error Correction. Looking at Table 6, we see that we'll have 17 ECC Codewords. Looking at Table 7, we see that means our generator is [43, 139, 206, 78, 43, 239, 123, 206, 214, 147, 24, 99, 150, 39, 243, 163, 136]. Let's pretend our data, which for a Version 1 with H Error Correction will be 9 words long, is [32, 65, 205, 69, 41, 220, 46, 128, 236]. So our staring array looks like this:

32 65 205 69 41 220 46 128 236 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

We remove the first element of the array, and if we look up the first value on Table 5, we see that 32 is the 5th exponent of α. Therefore, we add 5 to our ECC numbers and get:

48 144 211 83 48 244 128 211 219 152 29 104 155 44 248 168 141

Now we convert all these numbers from exponents of α to integer values, using Table 4. Then we use bitwise XOR on each set of values. The table below shows the original values (after we removed the first element,) then the integer values of the immediately preceding list (the generator numbers + 5,) and finally the result of the XOR mask.

65 205 69 41 220 46 128 236 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
70 168 178 187 70 250 133 178 86 73 48 13 114 238 27 252 21 - - - - - - - -
7 101 247 146 154 212 5 94 86 73 48 13 114 238 27 252 21 - - - - - - - -

So the third row in the above table is our new value array. We see on Table 5 that 7 converts to 198, so we remove the first element (7) from our new value array, and add 198 to each element in our original generator of [43, 139, 206, 78, 43, 239, 123, 206, 214, 147, 24, 99, 150, 39, 243, 163, 136]. Then we convert these to integer values again, and apply another XOR series. Once we've done this 9 times (for our 9 data code words) we'll have a value array that's 17 elements long - and these are our error correction code words. In this case, our final values are as follows:

42 159 74 221 244 169 239 150 138 70 237 85 224 96 74 219 61

EDIT: Thanks to Erik for pointing out the special case that if the first element is 0, the bitmask is all 0s. In other words, the operation is simply to remove the first element (0) and do nothing else.

Show/hide Table 8 - Blocks and Codewords

EDIT: I forgot to explain something that's very important. Each block consists of the data code words followed by the error correction code words. So for the example above, the one and only block for our 1H would be our data: [32, 65, 205, 69, 41, 220, 46, 128, 236], followed by our error correction: [42, 159, 74, 221, 244, 169, 239, 150, 138, 70, 237, 85, 224, 96, 74, 219, 61]. Since this symbol only has one block, we directly encode these 26 bytes into binary and place them according to the rules in the next section. However, for symbols with more blocks, we actually encode the first code word of each block in order, then the second, and so on. So the encoding for two blocks goes like block 1 code word 1, block 2 code word 1, block 1 code word 2, block 2 code word 2, etc. In many cases, not all the blocks have exactly the same number of code words. That's okay, you'll just have some that are one code word longer than some others. After some of them run out, just skip over them and continue the process with the others.

Error correction is by far the most complicated portion of a QR Code, and the easiest to mess up. If I haven't explained something fully, please leave a comment and I'll try to address it. Next week we'll start placing data in the symbol, and applying masks. Be sure to read through the previous posts if you haven't already - Part 1, Part 2, Part 3.