Appendix F - Classical Error Correcting Codes
Contents
Introduction
Classical error correcting codes are in use in a wide variety of digital electronics and other classical information systems. It is a good idea to learn some of the basic definitions, ideas, methods, and simple examples of classical error correcting codes in order to understand the (slightly) more complicated quantum error correcting codes. There are many good introductions to classical error correction. Here we follow a few sources which also discuss quantum error correcting codes: the book by Loepp and Wootters [25], an article in Lo, Popescu, and Spiller [26] by Steane, Gottesman's Thesis [27], and Gaitan's Book [3] on quantum error correction, which also discusses classical error correction.
Binary Operations
The set is a group under addition. (See Section D.2.8 of Appendix D.) The way this is achieved is by deciding that we will only use these two numbers in our language and using addition modulo 2, meaning and . If we also include the operation of multiplication and these two operations follow the distributive law, the set becomes a field (a Galois Field), which is denoted GF. Since one often works with strings of bits, it is very useful to consider the string of bits to be a vector and to use vector addition (which is component-wise addition) and vector multiplication (which is the inner product). For example, the addition of the vector and is . The inner product between these two vectors is .
Definitions and Basics
Definition 1
The inner product is also called a checksum or parity check since it shows whether or not the first and second vectors agree, or have an even number of 1's at the positions specified by the ones in the other vector. We may say that the first vector satisfies the parity check of the other vector, or vice versa.
Definition 2
The weight or Hamming weight is the number of non-zero components of a vector or string. The weight of a vector is denoted wt().
Definition 3
The Hamming distance is the number of places where two vectors differ. Let the two vectors be and . Then the Hamming distance is also equal to wt(). The Hamming distance between and will be denoted .
Definition 4
We use to denote the set of all binary vectors of length . A code of length is any subset of that set. The set of all elements of is called the set of codewords. We also say there are -bit words in the space.
Suppose bits are used to encode logical bits. We use the notation do denote such a code.
Definition 5
The minimum distance of a code is the smallest Hamming distance between any two non-equal vectors in a code. This can be written
(F.1) |
For shorthand, we also use or if is understood.
When that code has a distance , the notation is used.
Example 1
It is interesting to note that if we encode redundantly using and as our logical zero and logical one respectively, then we could detect single bit errors but not correct them. For example, if we receive , we know this cannot be one of our encoded states. So an error must have occurred. However, we don't know whether the sender sent or . We do know that an error has occurred though, as long as we know only one error has occurred. Such an encoding can be used as an error detecting code. In this case there are two code words, and , but four words in the space. The minimum distance is 2, which is the distance between the two code words.
Example 2
The three-bit redundant encoding was already given in Chapter 7. One takes logical zero and logical one states to be
(F.2) |
where the subscript is used to denote a "logical" state; that is, one that is encoded. Recall that this code is able to detect and correct one error. In this case there are two code words out of eight possible words, and the minimal distance is 3.
Definition 6
The rate of a code is given by the ration of the number of logical bits to the number of bits, .
Definition 7
A linear code is a code that is closed under addition.
Linear Codes
Linear codes are particularly useful because they are able to efficiently identify errors and the associated correct codewords. This ability is due to the added structure these codes have.
Generator Matrix
For linear codes, any linear combination of codewords is a codeword. One key feature of a linear code is that it can be specified by a ''generator matrix,'' [1]. For an code, the generator matrix is an matrix with columns that form a basis for the -dimensional coding sub-space of the -dimensional binary vector space. In other words, the vectors comprising the rows form a basis that will span the code space. (Note that one may also use the transpose of this matrix as the definition for .) Any code word described by a vector can be written in terms of the generator matrix as . Note that is independent of the input and output vectors. In addition, is not unique. If columns are switched or even added to produce a new vector that replaces a column, then the generator matrix is still valid for the code. This is due to the requirement that the columns be linearly independent, which is still satisfied if these operations are performed.
Parity Check Matrix
Once is obtained, one can calculate another useful matrix, is an matrix which has the property that
(F.3) |
The matrix is called the parity check matrix or dual matrix. The rank of is at most and has the property that it annihilates any code word. To see this, recall any code word can be written as Therefore, since That is to say, if and only if is a code word. This means that can be used to test whether or not a word is in the code.
Suppose an error occurs on a code word to produce . It follows that
(F.4) |
since . This result, is called the error syndrome and the measurement to identify is the syndrome measurement. Therefore, the result depends only on the error and not on the original code word. If the error can be determined from this result, then it can be corrected independent of the code word. However, in order to have be unique, two different results, and , must not be equal. This is possible if a distance code is constructed such that the parity check matrix has linearly independent columns. This enables the errors to be identified and corrected.
It is important to emphasize that these two matrices define the code as well as the check and necessary recovery operations. The matrix is determined by the code. Once this matrix is determined, there is a method for determining the parity check matrix, which is a set of mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. It is possible to determine the parity matrix from the generator matrix. The method for doing this can be found in Steane's article in Lo, Popescu, and Spiller [26] and it goes as follows. One first puts in the form of an augmented matrix where is the identity matrix. Then the parity check matrix is .
Errors
For any classical error correcting code, there are general conditions that must be satisfied in order for the code to be able to detect and correct errors. The two examples above show how the error can be detected; here, the objective is to give some general conditions.
Note that any state containing an error may be written as the sum of the original (logical or encoded) state and another vector . The error vector has ones in the places where errors are present and zeroes everywhere else. To ensure that the error may be corrected, the following condition must be satisfied for two states with errors occurring:
(F.5) |
This condition is called the disjointness condition. This condition means that an error on one state cannot be confused with an error on another state. If it could, then the state including the error could not be uniquely identified with an encoded state and the state could not be corrected to its original state after the error occurred. More specifically, for a code to correct single-bit errors, it must have distance at least between any two codewords; i.e., it must be true that . An code with minimal distance is denoted .
Example 3
An important example of an error correcting code is called the Hamming code. This code, as the notation indicates, encodes bits of information into bits. It also does it in such a way that one error can be detected and corrected since it has a distance of . The generator matrix for this code can be taken to be
(F.6) |
(See for example Loepp and Wootters [25].) From this the parity check matrix, can be calculated (as stated above) by finding a set of mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. Alternatively, one could use the method in Steane's article in Lo, Popescu, and Spiller [26]. Put in the form where is the identity matrix. Then the parity check matrix is In either case, one can arrive at the following parity check matrix for this code:
(F.7) |
It is useful to note that the code can also be defined by the parity check matrix. Only the codewords are annihilated by the parity check matrix.
The Disjointness Condition and Correcting Errors
The motivation for the disjointness condition, Eq.(F.5), is to associate each vector in the space with a particular code word. That is, assuming that only certain errors occur, each error vector should be associated to a particular vector in the code space when the error is added to the original code word. This partitions the set into disjoint subsets, with each containing only one code vector. A message is decoded correctly if the vector (the one containing the error) is in the subset that is associated with the original vector (the one with no error). For example, if one vector is sent, say , and an error occurs during transmission to produce , then this vector must be in the subset containing .
A way to decode is to record an array of possible code words, possible errors, and the combinations of those errors and code words. The array can be set up as a top row of the code word vectors and a leftmost column of errors, with the element of the first row and the first column being the zero vector and all subsequent entries in the column being errors. Then the element at the top of a column (say the jth column) is added to the error in the corresponding row (say the kth row) to get the j,k entry of the array. With this array one can associate a column with a subset that is disjoint with the other sets. Identifying the erred code word in a column associates it with a code word and thus corrects the error.
Example 4
In this example we are going to use (F.6) and (F.7) from the example above.
The set of code words is given by all of the linear combinations of the rows of meaning there are code words. The set of code words,
Now, suppose you are expecting to receive a code word, But, instead you receive What we are able to do is look at Table F.1 and see that is in column 5. Since the columns of this table represent the disjoint subsets of our code space, we see that and the error that occurred was or
The Hamming Bound
The Hamming bound is a bound that restricts the rate of the code. Due to the disjointness condition, a certain number of bits are required to ensure our ability to detect and correct errors. Suppose there is a set of bit vectors for encoding bits of information. There is a set of error vectors of weight that has elements[2]. So the number of error vectors, including errors of weight up to , is (Note that no error is also part of the set of error vectors. The objective is to be able to design a code that can correct all errors up to those of weight , and this includes no error at all.) Since there are vectors in the whole space of bits, and assuming vectors are used for the encoding, the Hamming bound is
(F.8) |
For linear codes, so
(F.9) |
Taking the logarithm,
(F.10) |
For large and , we can use Stirling's formula to show that
(F.11) |
where and we have neglected an overall multiplicative constant that goes to 1 as (Again, see the article in Lo, Popescu, and Spiller [26] by Steane.)
More Definitions
Definition 8: Dual Code
Let be a code and let be a vector in the code space. The dual code, denoted , is the set of all vectors that have zero inner product with all . In other words, it is the set of all vectors such that for all .
For binary vectors, a vector can be orthogonal to itself. Note that this is different from ordinary vectors in 3-d space.
The dual code is a useful entity in classical error correction and will be used in the construction of the quantum error correcting codes known as CSS codes.
Final Comments
As can be seen from the Hamming bound, there is a limit to the rate of an error correcting code. This does not indicate whether or not codes that satisfy these bounds exist, but it does tell us that no codes exist that do not satisfy these bounds. Encoding, decoding, error detection and correction are all difficult problems to solve in general. One of the advantages of the linear codes is that they provide a systematic method for identifying errors on a code through the use of the parity check operation. More generally, checking to see whether or not a bit string (vector) is in the code space would require a look-up table. This would be much more time-consuming than using the parity check matrix; matrix multiplication is quite efficient relative to the look-up table.
Many of these ideas and definitions will be utilized in Chapter 7 on quantum error correction. Some linear codes, including the Hamming code above, will have quantum analogues---as do many quantum error correcting codes. In quantum computers, as will be discussed, error correction is necessary due to the delicacy of quantum information. Such discussions will be taken up in Chapter 7.