Appendix F - Classical Error Correcting Codes

Introduction

Classical error correcting codes are in use in a wide variety of digital electronics and other classical information systems. It is a good idea to learn some of the basic definitions, ideas, methods, and simple examples of classical error correcting codes in order to understand the (slightly) more complicated quantum error correcting codes. There are many good introductions to classical error correction. Here we follow a few sources which also discuss quantum error correcting codes: the book by Loepp and Wootters [25], an article in Lo, Popescu, and Spiller [26] by Steane, Gottesman's Thesis [27], and Gaitan's Book [3] on quantum error correction, which also discusses classical error correction.

Binary Operations

The set $\{0,1\}\,\!$ is a group under addition. (See Section D.2.8 of Appendix D.) The way this is achieved is by deciding that we will only use these two numbers in our language and using addition modulo 2, meaning $0+0=0,1+0=0+1=1,\,\!$ and $1+1=0\,\!$ . If we also include the operation of multiplication and these two operations follow the distributive law, the set becomes a field (a Galois Field), which is denoted GF $(2)\,\!$ . Since one often works with strings of bits, it is very useful to consider the string of bits to be a vector and to use vector addition (which is component-wise addition) and vector multiplication (which is the inner product). For example, the addition of the vector $(0,0,1)\,\!$ and $(0,1,1)\,\!$ is $(0,0,1)+(0,1,1)=(0,1,0)\,\!$ . The inner product between these two vectors is $(0,0,1)\cdot (0,1,1)=0\cdot 0+0\cdot 1+1\cdot 1=0+0+1=1\,\!$ .

Definitions and Basics

Definition 1

The inner product is also called a checksum or parity check since it shows whether or not the first and second vectors agree, or have an even number of 1's at the positions specified by the ones in the other vector. We may say that the first vector satisfies the parity check of the other vector, or vice versa.

Definition 2

The weight or Hamming weight is the number of non-zero components of a vector or string. The weight of a vector $v\,\!$ is denoted wt( $v\,\!$ ).

Definition 3

The Hamming distance is the number of places where two vectors differ. Let the two vectors be $v\,\!$ and $w\,\!$ . Then the Hamming distance is also equal to wt( $v+w\,\!$ ). The Hamming distance between $v\,\!$ and $w\,\!$ will be denoted $d_{H}(v,w)\,\!$ .

Definition 4

We use $\{0,1\}^{n}\,\!$ to denote the set of all binary vectors of length $n\,\!$ . A code $C\,\!$ of length $n\,\!$ is any subset of that set. The set of all elements of $C\,\!$ is called the set of codewords. We also say there are $2^{n}\,\!$ $n\,\!$ -bit words in the space.

Suppose $n\,\!$ bits are used to encode $k\,\!$ logical bits. We use the notation $[n,k]\,\!$ do denote such a code.

Definition 5

The minimum distance of a code is the smallest Hamming distance between any two non-equal vectors in a code. This can be written


	$d_{Hmin}(C)={\underset {v,w\in C,v\neq w}{\mbox{min}}}d_{H}(v,w).\,\!$	(F.1)

For shorthand, we also use $d(C)\,\!$ or $d\,\!$ if $C\,\!$ is understood.

When that code has a distance $d\,\!$ , the notation $[n,k,d]\,\!$ is used.

Example 1

It is interesting to note that if we encode redundantly using $0_{L}=00\,\!$ and $1_{L}=11\,\!$ as our logical zero and logical one respectively, then we could detect single bit errors but not correct them. For example, if we receive $01\,\!$ , we know this cannot be one of our encoded states. So an error must have occurred. However, we don't know whether the sender sent $0_{L}=00\,\!$ or $1_{L}=11\,\!$ . We do know that an error has occurred though, as long as we know only one error has occurred. Such an encoding can be used as an error detecting code. In this case there are two code words, $0_{L}=00\,\!$ and $1_{L}=11\,\!$ , but four words in the space. The minimum distance is 2, which is the distance between the two code words.

Example 2

The three-bit redundant encoding was already given in Chapter 7. One takes logical zero and logical one states to be


	$0_{L}=000\;\;\;{\mbox{ and }}\;\;\;1_{L}=111,\,\!$	(F.2)

where the subscript $L\,\!$ is used to denote a "logical" state; that is, one that is encoded. Recall that this code is able to detect and correct one error. In this case there are two code words out of eight possible words, and the minimal distance is 3.

Definition 6

The rate of a code is given by the ration of the number of logical bits to the number of bits, $k/n\,\!$ .

Definition 7

A linear code $C_{l}\,\!$ is a code that is closed under addition.

Linear Codes

Linear codes are particularly useful because they are able to efficiently identify errors and the associated correct codewords. This ability is due to the added structure these codes have. These will be discussed in the following sections.

Generator Matrix

For linear codes, any linear combination of codewords is a codeword. One key feature of a linear code is that it can be specified by a ''generator matrix,'' $G\,\!$ ^[1]. For an $[n,k]\,\!$ code, the generator matrix is an $n\times k\,\!$ matrix with columns that form a basis for the $k\,\!$ -dimensional coding sub-space of the $n\,\!$ -dimensional binary vector space. In other words, the vectors comprising the rows form a basis that will span the code space. (Note that one may also use the transpose of this matrix as the definition for $G\,\!$ .) Any code word $w\,\!$ described by a vector $v\,\!$ can be written in terms of the generator matrix as $w=Gv\,\!$ . Note that $G\,\!$ is independent of the input and output vectors. In addition, $G\,\!$ is not unique. If columns are switched or even added to produce a new vector that replaces a column, then the generator matrix is still valid for the code. This is due to the requirement that the columns be linearly independent, which is still satisfied if these operations are performed.

Parity Check Matrix

Once $G\,\!$ is obtained, one can calculate another useful matrix, $P.\,\!$ $P\,\!$ is an $(n-k)\times n\,\!$ matrix which has the property that


	$PG=0.\,\!$	(F.3)

The matrix $P\,\!$ is called the parity check matrix or dual matrix. The rank of $P\,\!$ is at most $n-k\,\!$ and has the property that it annihilates any code word. To see this, recall any code word is written as $Gv\,\!$ : $PGv=0\,\!$ since $PG=0.\,\!$ Also, due to the rank of $P,\,\!$ it can be shown that $Pw=0\,\!$ only if $w\,\!$ is a code word. That is to say, $Pw=0\,\!$ if and only if $w\,\!$ is a code word. This means that $P\,\!$ can be used to test whether or not a word is in the code.

Suppose an error occurs on a code word $w\,\!$ to produce $w^{\prime }=w+e\,\!$ . It follows that


	$Pw^{\prime }=P(w+e)=Pe,\,\!$	(F.4)

since $Pw=0\,\!$ . This result, $Pe\,\!$ is called the error syndrome and the measurement to identify $Pe\,\!$ is the syndrome measurement. Therefore, the result depends only on the error and not on the original code word. If the error can be determined from this result, then it can be corrected independent of the code word. However, in order to have $Pe\,\!$ be unique, two different results, $Pe_{1}\,\!$ and $Pe_{2}\,\!$ , must not be equal. This is possible if a distance $d\,\!$ code is constructed such that the parity check matrix has $d-1=2t\,\!$ linearly independent columns. This enables the errors to be identified and corrected.

It is important to emphasize that these two matrices define the code as well as the check and necessary recovery operations. The matrix $G\,\!$ is determined by the code. Once this matrix is determined, there is a method for determining the parity check matrix, $P\,\!$ which is a set of $n-k\,\!$ mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. It is possible to determine the parity matrix from the generator matrix. The method for doing this can be found in Steane's article in Lo, Popescu, and Spiller [26] and it goes as follows. One first puts $G^{T}\,\!$ in the form of an augmented matrix $(I_{k}|A),\,\!$ where $I_{k}\,\!$ is the $k\times k\,\!$ identity matrix. Then the parity check matrix is $P=(A^{T}|I_{n-k}).\,\!$ .

Errors

For any classical error correcting code, there are general conditions that must be satisfied in order for the code to be able to detect and correct errors. The two examples above show how the error can be detected; here, the objective is to give some general conditions.

Note that any state containing an error may be written as the sum of the original (logical or encoded) state $w\,\!$ and another vector $e\,\!$ . The error vector $e\,\!$ has ones in the places where errors are present and zeroes everywhere else. To ensure that the error may be corrected, the following condition must be satisfied for two states with errors occurring:


	$w_{1}+e_{1}\neq w_{2}+e_{2}.\,\!$	(F.5)

This condition is called the disjointness condition. This condition means that an error on one state cannot be confused with an error on another state. If it could, then the state including the error could not be uniquely identified with an encoded state and the state could not be corrected to its original state after the error occurred. More specifically, for a code to correct $t\,\!$ single-bit errors, it must have distance at least $2t+1\,\!$ between any two codewords; i.e., it must be true that $d(C)\geq 2t+1\,\!$ . An $[n,k]\,\!$ code with minimal distance $d\,\!$ is denoted $[n,k,d]\,\!$ .

Example 3

An important example of an error correcting code is called the $[7,4,3]$ Hamming code. This code, as the notation indicates, encodes $k=4$ bits of information into $n=7$ bits. It also does it in such a way that one error can be detected and corrected since it has a distance of $3$ . The generator matrix for this code can be taken to be


	$G^{T}=\left({\begin{array}{ccccccc}1&0&0&0&1&1&0\\0&1&0&0&1&1&1\\0&0&1&0&1&0&1\\0&0&0&1&0&1&1\end{array}}\right).\,\!$	(F.6)

(See for example Loepp and Wootters [25].) From this the parity check matrix, $P\,\!$ can be calculated (as stated above) by finding a set of $n-k\,\!$ mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. Alternatively, one could use the method in Steane's article in Lo, Popescu, and Spiller [26]. Put $G^{T}\,\!$ in the form $(I_{k}|A),\,\!$ where $I_{k}\,\!$ is the $k\times k\,\!$ identity matrix. Then the parity check matrix is $P=(A^{T}|I_{n-k}).\,\!$ In either case, one can arrive at the following parity check matrix for this code:


	$P=\left({\begin{array}{ccccccc}1&1&1&0&1&0&0\\1&1&0&1&0&1&0\\0&1&1&1&0&0&1\end{array}}\right).\,\!$	(F.7)

It is useful to note that the code can also be defined by the parity check matrix. Only the codewords are annihilated by the parity check matrix.

The Disjointness Condition and Correcting Errors

The motivation for the disjointness condition, Eq.(F.5), is to associate each vector in the space with a particular code word. That is, assuming that only certain errors occur, each error vector should be associated to a particular vector in the code space when the error is added to the original code word. This partitions the set into disjoint subsets, with each containing only one code vector. A message is decoded correctly if the vector (the one containing the error) is in the subset that is associated with the original vector (the one with no error). For example, if one vector is sent, say $v_{1}\,\!$ , and an error occurs during transmission to produce $v_{2}=v_{1}+e\,\!$ , then this vector must be in the subset containing $v_{1}\,\!$ .

A way to decode is to record an array of possible code words, possible errors, and the combinations of those errors and code words. The array can be set up as a top row of the code word vectors and a leftmost column of errors, with the element of the first row and the first column being the zero vector and all subsequent entries in the column being errors. Then the element at the top of a column (say the jth column) is added to the error in the corresponding row (say the kth row) to get the j,k entry of the array. With this array one can associate a column with a subset that is disjoint with the other sets. Identifying the erred code word in a column associates it with a code word and thus corrects the error.

Example 4

In this example we are going to use $G\,\!$ (F.6) and $P\,\!$ (F.7) from the example above.

The set of code words is given by all of the linear combinations of the rows of $P,\,\!$ meaning there are $2^{3}\,\!$ code words. The set of code words,

C=\left\{0000000,1110100,1101010,0111001,0100111,1010011,0011110,1001101\right\}.\,\!

TABLE F.1

Table F.1: *Array to determine possible errors on an unknown code word in set $C\,\!$*
$0000000\,\!$	$1110100\,\!$	$1101010\,\!$	$0111001\,\!$	$0100111\,\!$	$1010011\,\!$	$0011110\,\!$	$1001101\,\!$
$1000000\,\!$	$0110100\,\!$	$0101010\,\!$	$1111001\,\!$	$1100111\,\!$	$0010011\,\!$	$1011110\,\!$	$0001101\,\!$
$0100000\,\!$	$1010100\,\!$	$1001010\,\!$	$0011001\,\!$	$0000111\,\!$	$1110011\,\!$	$0111110\,\!$	$1101101\,\!$
$0010000\,\!$	$1100100\,\!$	$1111010\,\!$	$0101001\,\!$	$0110111\,\!$	$1000011\,\!$	$0001110\,\!$	$1011101\,\!$
$0001000\,\!$	$1111100\,\!$	$1100010\,\!$	$0110001\,\!$	$0101111\,\!$	$1011011\,\!$	$0010110\,\!$	$1000101\,\!$
$0000100\,\!$	$1110000\,\!$	$1101110\,\!$	$0111101\,\!$	$0100011\,\!$	$1010111\,\!$	$0011010\,\!$	$1001001\,\!$
$0000010\,\!$	$1110110\,\!$	$1101000\,\!$	$0111011\,\!$	$0100101\,\!$	$1010001\,\!$	$0011100\,\!$	$1001111\,\!$
$0000001\,\!$	$1110101\,\!$	$1101011\,\!$	$0111000\,\!$	$0100110\,\!$	$1010010\,\!$	$0011111\,\!$	$1001100\,\!$

Now, suppose you are expecting to receive a code word, $c\in C.\,\!$ But, instead you receive $0101111\notin C.\,\!$ What we are able to do is look at Table F.1 and see that $0101111\,\!$ is in column 5. Since the columns of this table represent the disjoint subsets of our code space, we see that $c=0100111\,\!$ and the error that occurred was $e_{4}\,\!$ or $0001000.\,\!$

The Hamming Bound

The Hamming bound is a bound that restricts the rate of the code. Due to the disjointness condition, a certain number of bits are required to ensure our ability to detect and correct errors. Suppose there is a set of $n\,\!$ bit vectors for encoding $k\,\!$ bits of information. There is a set of error vectors of weight $t\,\!$ that has $C(n,t)\,\!$ elements^[2]. So the number of error vectors, including errors of weight up to $t\,\!$ , is $\sum _{i=0}^{t}C(n,i).\,\!$ (Note that no error is also part of the set of error vectors. The objective is to be able to design a code that can correct all errors up to those of weight $t\,\!$ , and this includes no error at all.) Since there are $2^{n}\,\!$ vectors in the whole space of $n\,\!$ bits, and assuming $m\,\!$ vectors are used for the encoding, the Hamming bound is


	$m\sum _{i=0}^{t}C(n,i)\leq 2^{n}.\,\!$	(F.8)

For linear codes, $m=2^{k},\,\!$ so


	$2^{k}\sum _{i=0}^{t}C(n,i)\leq 2^{n}.\,\!$	(F.9)

Taking the logarithm,


	$k\leq n-\log _{2}\left(\sum _{i=0}^{t}C(n,i)\right).\,\!$	(F.10)

For large $n,k\,\!$ and $t\,\!$ , we can use Stirling's formula to show that


	${\frac {k}{n}}\leq 1-H\left({\frac {t}{n}}\right),\,\!$	(F.11)

where $H(x)=-x\log x-(1-x)\log(1-x)\,\!$ and we have neglected an overall multiplicative constant that goes to 1 as $n\rightarrow \infty .\,\!$ (Again, see the article in Lo, Popescu, and Spiller [26] by Steane.)

More Definitions

Definition 8: Dual Code

Let ${\mathcal {C}}\,\!$ be a code and let $v\,\!$ be a vector in the code space. The dual code, denoted ${\mathcal {C}}^{\perp }\,\!$ , is the set of all vectors that have zero inner product with all $v\in {\mathcal {C}}\,\!$ . In other words, it is the set of all vectors $u\,\!$ such that $u\cdot v=0\,\!$ for all $v\in {\mathcal {C}}\,\!$ .

For binary vectors, a vector can be orthogonal to itself. Note that this is different from ordinary vectors in 3-d space.

The dual code is a useful entity in classical error correction and will be used in the construction of the quantum error correcting codes known as CSS codes.

Final Comments

As can be seen from the Hamming bound, there is a limit to the rate of an error correcting code. This does not indicate whether or not codes that satisfy these bounds exist, but it does tell us that no codes exist that do not satisfy these bounds. Encoding, decoding, error detection and correction are all difficult problems to solve in general. One of the advantages of the linear codes is that they provide a systematic method for identifying errors on a code through the use of the parity check operation. More generally, checking to see whether or not a bit string (vector) is in the code space would require a look-up table. This would be much more time-consuming than using the parity check matrix; matrix multiplication is quite efficient relative to the look-up table.

Many of these ideas and definitions will be utilized in Chapter 7 on quantum error correction. Some linear codes, including the Hamming code above, will have quantum analogues---as do many quantum error correcting codes. In quantum computers, as will be discussed, error correction is necessary due to the delicacy of quantum information. Such discussions will be taken up in Chapter 7.

Footnotes

↑ Recall that we are working with binary codes. Thus the entries of the matrix will also be binary numbers, i.e., 0's and 1's.
↑ That is, $n\,\!$ choose $t\,\!$ vectors. The notation is $C(n,t)={n \choose t}={\frac {n!}{(n-t)!t!}}.\,\!$

[1] Recall that we are working with binary codes. Thus the entries of the matrix will also be binary numbers, i.e., 0's and 1's.

[2] That is, $n\,\!$ choose $t\,\!$ vectors. The notation is $C(n,t)={n \choose t}={\frac {n!}{(n-t)!t!}}.\,\!$

[1]

[2]