Appendix F - Classical Error Correcting Codes

Introduction

Classical error correcting codes are in use in a wide variety of digital electronics and other classical information systems. It is a good idea to learn some of the basic definitions, ideas, methods, and simple examples of classical error correcting codes in order to understand the (slightly) more complicated quantum error correcting codes. There are many good introductions to classical error correction. Here we follow a few sources which also discuss quantum error correcting codes: the book by Loepp and Wootters, an article in Lo, Popescu, and Spiller by Steane, Gottesman's Thesis, and Gaitan's Book on quantum error correction, which also discusses classical error correction.

Binary Operations

The set $\{0,1\}\,\!$ is a group under addition. (See Section D.2.8 of Appendix D.) The way this is achieved is by deciding that we will only use these two numbers in our language and using addition modulo 2, meaning, $0+0=0,1+0=0+1=1,\,\!$ and $1+1=0\,\!$ . If we also include the operation of multiplication, the set, with the two operations, becomes a field (a Galois Field) and that field is denoted GF $(2)\,\!$ . Since one often works with strings of bits, it is very useful to consider the string of bits to be a vector and use vector addition (which is component-wise addition) and a vector multiplication which is the inner product. For example, the addition of the vector $(0,0,1)\,\!$ and $(0,1,1)\,\!$ is $(0,0,1)+(0,1,1)=(0,1,0)\,\!$ . The inner product between these two vectors is $(0,0,1)\cdot (0,1,1)=0\cdot 0+0\cdot 1+1\cdot 1=0+0+1=1\,\!$ .

Definitions and Basics

Definition 1

The inner product is also called a checksum or parity check since it shows whether or not the first and second vectors agree have an even number of 1's at the positions specified by the ones in the other vector. We may say that the first vector satisfies the parity check of the first vector, or vice versa.

Definition 2

The weight or Hamming weight is the number of non-zero components of a vector or string. The weight of a vector $v\,\!$ is denoted wt( $v\,\!$ ).

Definition 3

The Hamming distance is the number of places where two vectors differ. Let the two vectors be $v\,\!$ and $w\,\!$ , then the Hamming distance is also equal to wt( $v+w\,\!$ ). The Hamming distance between $v\,\!$ and $w\,\!$ will be denoted $d_{H}(v,w)\,\!$ .

Definition 4

We use $\{0,1\}^{n}\,\!$ denote the set of all binary vectors of length $n\,\!$ . A code $C\,\!$ of length $n\,\!$ is any subset of that set. The set of all elements of $C\,\!$ is called the set of codewords. We also say there are $2^{n}\,\!$ $n\,\!$ -bit words in the space.

Suppose $n\,\!$ bits are used to encode $k\,\!$ logical bits. We use the notation $[n,k]\,\!$ do denote such a code.

Definition 5

The minimum distance of a code is the smallest Hamming distance between any two non-equal vectors in a code. This can be written


	$d_{Hmin}(C)={\underset {v,w\in C,v\neq w}{\mbox{min}}}d_{H}(v,w).\,\!$	(F.1)

For shorthand, we also use $d(C)\,\!$ or $d\,\!$ if $C\,\!$ is understood.

When that code has a distance $d\,\!$ , the notation $[n,k,d]\,\!$ is used.

Example 1

It is interesting to note that if we encode redundantly using $0_{L}=00\,\!$ and $1_{L}=11\,\!$ as our logical zero and logical one respectively, then we could detect single bit errors, but not correct them. For example, if we receive $01\,\!$ , we know this cannot be one of our encoded states. So an error must have occurred. However, we don't know whether the sender sent $0_{L}=00\,\!$ or $1_{L}=11\,\!$ . We do know that an error has occurred though, as long as we know only one error has occurred. Such an encoding can be used as an error detecting code. In this case there are two code words, $0_{L}=00\,\!$ and $1_{L}=11\,\!$ , but four words in the space. The minimum distance is 2 which is the distance between the two code words.

Example 2

The three-bit redundant encoding was already given in Chapter 7. One takes logical zero and logical one states to be the following


	$0_{L}=000,\;\;\;{\mbox{ and }}\;\;\;1_{L}=111,\,\!$	(F.2)

where the subscript $L\,\!$ is used to denote a "logical" state, that is, one which is encoded. Recall that this code is able to detect and correct one error. In this case there are two code words out of eight possible words and the minimal distance is 3.

Definition 6

The rate of a code is given by the ration of the number of logical bits to the number of bits, $k/n\,\!$ .

Definition 7

A linear code $C_{l}\,\!$ is a code which is closed under addition.

Linear Codes

Linear codes are particularly simple codes with extra features. These codes are important for their simplicity as well as the added structure.

Generator Matrix

For linear codes, any linear combination of codewords is a codeword. One key feature of a linear code is that it can be specified by a ''generator matrix'' $G\,\!$ ^[1]. For an $[n,k]\,\!$ code, the generator matrix is an $n\times k\,\!$ matrix with columns that form a basis for the $k\,\!$ -dimensional coding sub-space of the $n\,\!$ -dimensional binary vector space. In other words, the vectors comprising the rows form a basis which will span the code space. (Note that one may also use the transpose of this matrix as the definition for $G\,\!$ .) Any code word $w\,\!$ , described by a vector $v\,\!$ can be written in terms of the generator matrix as $w=Gv\,\!$ . Note that $G\,\!$ is independent of the input and output vectors. In addition, $G\,\!$ is not unique. If columns are switched, or even added to produce a new vector which replaces a column, the generator matrix is still valid for the code. This is due to the requirement that the columns be linearly independent which is still satisfied if these operations are performed.

Parity Check Matrix

Once $G\,\!$ is obtained, one can calculate another useful matrix $P\,\!$ . $P\,\!$ is an $(n\times k)\times n\,\!$ matrix which has the property that


	$PG=0.\,\!$	(F.3)

The matrix $P\,\!$ is called the parity check matrix or dual matrix. The rank of $P\,\!$ is at most $n-k\,\!$ and has the property that it annihilates any code word. To see this, recall any code word is written as $Gv\,\!$ , so $PGv=0\,\!$ since $PG=0\,\!$ . Also, due to the rank of $P\,\!$ , it can be shown that $Pw=0\,\!$ only if $w\,\!$ is a code word. That is to say, $Pw=0\,\!$ if and only if $w\,\!$ is a code word. This means that $P\,\!$ can be used to test whether or not a word is in the code.

Suppose an error occurs on a code word $w\,\!$ to produce $w^{\prime }=w+e\,\!$ . Now consider


	$Pw^{\prime }=P(w+e)=Pe,\,\!$	(F.4)

since $Pw=0\,\!$ . This result, $Pe\,\!$ is called the error syndrome and the measurement to identify $Pe\,\!$ is called the syndrome measurement. Therefore, the result depends only on the error and not on the original code word. If the error can be determined from this result, then it can be corrected independent of the code word. However, in order to have $Pe\,\!$ be unique, two different results $Pe_{1}\,\!$ and $Pe_{2}\,\!$ must not be equal. This is possible if a distance $d\,\!$ code is constructed such that the parity check matrix has $d-1=2t\,\!$ linearly independent columns. This enables the errors to be identified and thus corrected.

Errors

For any classical error correcting code, there are general conditions which must be satisfied in order for the code to be able to detect and correct errors. The two examples above show how the error can be detected, but here the objective is to give some general conditions.

Note that any state containing an error may be written as the sum of the original state (logical or encoded state) $w\,\!$ and another vector $e\,\!$ . The error vector $e\,\!$ has ones are in the places where errors are present and zeroes everywhere else. To ensure that the error may be corrected, the following condition must be satisfied for two states with errors occurring


	$w_{1}+e_{1}\neq w_{2}+e_{2}.\,\!$	(F.5)

This condition is called the disjointness condition. This condition means that an error on one state cannot be confused with an error on another state. If it could, then the state including the error could not be uniquely identified with an encoded state and the state could not be corrected to its original state before the error occurred. More specifically, for a code to correct $t\,\!$ single-bit errors, it must have distance at least $2t+1\,\!$ between any two codewords, i.e., it must be a distance $2t+1\,\!$ code, i.e., it must be true that $d(C)\geq 2t+1\,\!$ . An $[n,k]\,\!$ code with minimal distance $d\,\!$ is denoted $[n,k,d]\,\!$ .

Example 3

An important example of an error correcting code is called the $[7,4,3]$ Hamming code. This code, as the notation indicates, encodes $k=4$ bits of information into $n=7$ bits. It also does it in such a way that one error can be detected and corrected since it has a distance of $3$ . The generator matrix for this code can be taken to be (see Loepp and Wootters)


	$G^{T}=\left({\begin{array}{ccccccc}1&0&0&0&1&1&0\\0&1&0&0&1&1&1\\0&0&1&0&1&0&1\\0&0&0&1&0&1&1\end{array}}\right).\,\!$	(F.6)

From this the parity check matrix, $P\,\!$ can be calculated by finding a set of $n-k\,\!$ mutually orthogonal vectors which are also orthogonal to the code space defined by the generator matrix. Alternatively, one could find the generator matrix from the parity check matrix. A method for doing this can be found in Steane's article in Lo, Popescu, and Spiller. One first puts $G^{T}\,\!$ in the form $(I_{k},A),\,\!$ where $I_{k}\,\!$ is the $k\times k\,\!$ identity matrix. Then the parity check matrix $P=(A^{T},I_{n-k}).\,\!$ In either case, one can arrive at the following parity check matrix for this code:


	$P=\left({\begin{array}{ccccccc}1&1&1&0&1&0&0\\1&1&0&1&0&1&0\\0&1&1&1&0&0&1\end{array}}\right).\,\!$	(F.7)

It is useful to note that the code can also be defined by the parity check matrix. The codewords are annihilated by the parity check matrix and only the codewords are annihilated by that matrix.

The Disjointness Condition and Correcting Errors

The motivation for the disjointness condition, Eq.(F.5), is to associate each vector in the space with a particular code word. That is, assuming that only certain errors occur, each error vector should be associated to a particular vector in the code space when the error is added to the original code word. This partitions the set into disjoint subsets each of which contains only one code vector. A message is decoded correctly if the vector (the one containing the error) is in the subset which is associated with the original vector (the one with no error). For example, if one vector is sent, say $v_{1}\,\!$ and an error occurs during transmission to produce $v_{2}=v_{1}+e\,\!$ , this vector must be in the subset containing $v_{1}\,\!$ .

A way to decode is to record an array of possible code words, possible errors, and the combinations of those errors and code words. The array can be set up as a top row of the code word vectors and a leftmost column of errors with the element of the first row and the first column being the zero vector and all subsequent entries in the column being errors. Then the element at the top of a column (say the jth column) is added to the error in the corresponding row (say the kth row) to get the j,k entry of the array. With this array one can associate a column with a subset that is disjoint with the other sets. Then identifying the erred code word in a column associates it with a code word and thus corrects the error.

The Hamming Bound

The Hamming bound is a bound that restricts the rate of the code. Due to the disjointness condition, a certain number of bits are required to ensure our ability to detect and correct errors. Suppose there is a set of $n\,\!$ bit vectors for encoding $k\,\!$ bits of information. There is a set of error vectors of weight $t\,\!$ that has $C(n,t)\,\!$ elements^[2]. So the number of error vectors, including errors of weight up to $t\,\!$ , is $\sum _{i=0}^{t}C(n,i).\,\!$ (Note that no error is also part of the set of error vectors. The objective is to be able to design a code which can correct all errors up to those of weight $t\,\!$ and this includes no error at all.) Since there are $2^{n}\,\!$ vectors in the whole space of $n\,\!$ bits, and assuming $m\,\!$ vectors are used for the encoding, the Hamming bound is


	$m\sum _{i=0}^{t}C(n,i)\leq 2^{n}.\,\!$	(F.8)

For linear codes, $m=2^{k},\,\!$ so


	$2^{k}\sum _{i=0}^{t}C(n,i)\leq 2^{n}.\,\!$	(F.9)

Taking the logarithm,


	$k\leq n-\log _{2}\left(\sum _{i=0}^{t}C(n,i)\right).\,\!$	(F.10)

For large $n,k\,\!$ and $t\,\!$ , we can use Stirling's formula to show that


	${\frac {k}{n}}\leq 1-H\left({\frac {t}{n}}\right),\,\!$	(F.10)

where $H(x)=-x\log x-(1-x)\log(1-x)\,\!$ and we have neglected an overall multiplicative constant which goes to 1 as $n\rightarrow \infty .\,\!$ (Again, see the article in Lo, Popescu, and Spiller by Steane.)

NOTES

As can be seen from the Hamming bound, there is a limit to rate of an error correcting code. This does not indicate whether or not codes satisfying these bounds exists, but it does tell us that no codes exist which do not satisfy these bounds. Encoding decoding and error detection and correction are all problems which can be difficult to solve in general. One of the advantages of the linear codes is that they provide a systematic method for identifying errors on a code through the use of the parity check operation. More generally, checking to see whether or not a bit string (vector) is in the code space would require a lookup table. This would be much more time-consuming that using the parity check matrix where matrix multiplication is quite efficient relative to the lookup table.

Many of these ideas and definitions will be utilized in Chapter 7 on quantum error correction. Some linear codes, including the Hamming code above, will have quantum analogues as do many quantum error correcting codes. In quantum computers, as will be discussed, error correction is necessary since the quantum information is delicate. Such discussions will be taken up in Chapter 7.

Footnotes

↑ Recall we are working with binary codes. So the entries of the matrix will also be binary numbers, i.e., 0's and 1's.
↑ That is $n\,\!$ choose $t\,\!$ vectors. The notation is $C(n,t)={n \choose t}={\frac {n!}{(n-t)!t!}}.\,\!$

[1] Recall we are working with binary codes. So the entries of the matrix will also be binary numbers, i.e., 0's and 1's.

[2] That is $n\,\!$ choose $t\,\!$ vectors. The notation is $C(n,t)={n \choose t}={\frac {n!}{(n-t)!t!}}.\,\!$

[1]

[2]

Appendix F - Classical Error Correcting Codes

Contents