Qunet - User contributions [en]

Chapter 3 - Physics of Quantum Information

2013-03-25T19:42:02Z

Ddghunter: /* General Properties */ bad link

===Introduction===

It was a great realization that information is physical and that a
(classical) Turing machine is not the end of the story of
computation. The physical system in which the information is stored
and manipulated is important and qubits are quite different from
bits.

In this chapter, some background in quantum mechanics is provided.
Not all of this chapter will be directly relevant to our discussion,
but it is included to progress our understanding
of how quantum mechanics from a textbook is related to quantum
computing. The connection is clear, but the story seems
incomplete from a physicists perspective. For the subject of error
prevention methods, some of this chapter will be vital---in
particular, the section(s) concerning the density matrix. Not only
is this vital, it is often not covered in quantum mechanics
classes, both undergraduate and graduate.

It is also worth emphasizing that this chapter is primarily aimed at
physicists and for any others who are interested in the background
physics. However, it is not necessary for much of what follows.

===Schrodinger's Equation===

A common starting point in quantum mechanics is Schrodinger's equation. This equation is not derived or justified here, but is given in a general form:
{{Equation|<math>
H \left\vert \Psi\right\rangle = i\hbar\frac{\partial}{\partial t}\left\vert \Psi\right\rangle,
\,\!</math>|3.1}} 
where <math>H\,\!</math> is the Hamiltonian, 
<math>\hbar\,\!</math> is Planck's constant

(divided by <math>2\pi\,\!</math>), and <math>t\,\!</math> is time. The Hamiltonian contains what
is known about the system's evolution.
Most of the time in these notes, we let <math>\hbar = 1\,\!</math>.

This equation is (formally) solved by taking the time derivative to be
an ordinary derivative (we assume no explicit time dependence for
<math>H \,\!</math>), so
{{Equation|<math>
H \left\vert \Psi\right\rangle = i\frac{d \left\vert \Psi\right\rangle}{dt}.
\,\!</math>|3.2}} 
This means that
{{Equation|<math>
-iHdt = \frac{d \left\vert \Psi\right\rangle}{\left\vert \Psi\right\rangle},
\,\!</math>|3.3}} 
so
{{Equation|<math>\begin{align}
\ln \left\vert \Psi\right\rangle &= -iHt + C, \\
\Rightarrow\left\vert \Psi(t)\right\rangle &= e^{-iHt}\left\vert \Psi(0)\right\rangle.
\end{align}
\,\!</math>|3.4}}
Now if <math>H\,\!</math> is Hermitian (it is), then the matrix
{{Equation|<math>
U = e^{-iHt}
\,\!</math>|3.5}} 
is unitary. 
(If this is unclear, see [[Appendix C - Vectors and Linear Algebra]], in particular the section entitled [[Appendix C - Vectors and Linear Algebra#Unitary Matrices|Unitary Matrices]].) Any
transformation on a closed system can be described by a unitary
transformation and any unitary transformation can be obtained by the
exponentiation of a Hermitian matrix.

The end result and important point is that the evolution of a quantum
state is, in general, given by a unitary matrix
{{Equation|<math>
\left\vert \Psi(t)\right\rangle = U\left\vert \Psi(0)\right\rangle.
\,\!</math>|3.6}} 
So our objective in quantum information processing is to create a
unitary evolution, and eventual measurement, which will produce a
particular outcome.

====Exponentiating a Matrix====

<div id="expmatrix"> ''Aside: a note about the exponentiation of a matrix.''</div>

It may seem strange to exponentiate a matrix. However, you can define
a function of a matrix according to its Taylor expansion. The details
of this are primarily unimportant here, but for demonstration purposes,
it is written out.

The Taylor expansion of an exponential is the following:
{{Equation|<math>
e^x = \sum_{n=0}^\infty \frac{x^n}{n!}
\,\!</math>|3.7}} 
and this can be used to exponentiate a matrix by letting the matrix
replace <math>x\,\!</math> in the equation. This can also be used to prove that
{{Equation|<math>
e^{ix}=\cos x +i\sin x.
\,\!</math>|3.8}} 
''End Aside''

===Density Matrix for Pure States===

Now let us consider the object (a ''density matrix, or
density operator, of rank one'') 
{{Equation|<math>
\rho = \left\vert\psi\right\rangle \left\langle \psi\right\vert,
</math>|3.9}} 
which is just the outer product of two vectors. (See Appendix C.2.4, [[Appendix C - Vectors and Linear Algebra#Outer Product|Outer Product]].)

Since <math>\left\vert \psi\right\rangle = \left\vert \psi(t)\right\rangle\,\!</math>, <math>\rho=\rho(t)\,\!</math> is also true. If we
differentiate this with respect to <math>t\,\!</math>, we discover
{{Equation|<math>\begin{align}
\frac{\partial \rho }{\partial t} &=
\left(\frac{\partial \left\vert \psi\right\rangle}{\partial t}\right)\left\langle\psi\right\vert
+ \left\vert \psi\right\rangle\left(\frac{\partial \left\langle\psi\right\vert}{\partial t}\right)\\
&= (-iH)\rho + \rho (iH) = -i[H,\rho].
\end{align}
</math>|3.10}}
This is merely the Schrodinger equation for a density matrix with the solution
{{Equation|<math>
\rho(t) = U\rho(0)U^\dagger.
</math>|3.11}} 
This follows from <math>\left\vert\psi(t)\right\rangle\left\langle\psi(t)\right\vert =
U\left\vert\psi(0)\right\rangle\left\langle\psi(0)\right\vert U^\dagger\,\!</math>.

Consider our two-state system,
{{Equation|<math>
\left\vert 0\right\rangle = \left(\begin{array}{c} 1 \\ 0 \end{array}\right),
\;\;\; \mbox{and} \;\;\;
\left\vert 1\right\rangle = \left(\begin{array}{c} 0 \\ 1 \end{array}\right).
</math>|3.12}} 
Recall that the arbitrary superposition of these states is shown by
{{Equation|<math>
\left\vert \psi\right\rangle = \alpha_0\left\vert 0\right\rangle + \alpha_1\left\vert 1\right\rangle
= \left(\begin{array}{c} \alpha_0 \\ \alpha_1 \end{array}\right),
</math>|3.13}} 
where <math>\alpha_0\,\!</math> and <math>\alpha_1\,\!</math> are complex numbers such that
<math>|\alpha_0|^2 + |\alpha_1|^2 = 1\,\!</math>. The corresponding
''pure state'' (i.e. rank one) ''density matrix'' is given by
{{Equation|<math>
\rho_p = \left\vert\psi\right\rangle\left\langle\psi\right\vert
= \left(\begin{array}{cc}
|\alpha_0|^2 & \alpha_0 \alpha_1^* \\
\alpha_0^* \alpha_1 & |\alpha_1|^2 \end{array}\right).
</math>|3.14}} 
Note that the superposition in Eq.[[#eq3.13|(3.13)]] can be obtained from any pure state by a unitary transformation. Here, the trace of
the density matrix is an important quantity; it is
{{Equation|<math>
\mbox{Tr}(\rho_p) = |\alpha_0|^2 + |\alpha_1|^2 = 1.
\,\!</math>|3.15}} 
Notice also that the determinant of this matrix is zero, indicating that it has a zero eigenvalue:
{{Equation|<math>
\det(\rho_p) = |\alpha_0|^2|\alpha_1|^2 - \alpha_0 \alpha_1^*\alpha_0^*
\alpha_1 = 0.
</math>|3.16}} 
To see this another way, note that the density operator of rank one can be written as <math>U(\left\vert 0\right\rangle\left\langle0\right\vert)U^\dagger\,\!</math>, so that the determinant is
{{Equation|<math>\begin{align}
\det(U(\left\vert 0\right\rangle \left\langle 0\right\vert)U^\dagger) &= \det(U(\left\vert 0\right\rangle\left\langle 0\right\vert)U^{-1})\\
&= \det(U)\det(\left\vert0\right\rangle\left\langle 0\right\vert)\frac{1}{\det(U)} \\
&= \det(\left\vert 0\right\rangle \left\langle 0\right\vert) = 0.
\end{align}
</math>|3.17}}
This is a characteristic of a pure state and for two-state systems; it is a necessary and sufficient condition for the density operator to represent a pure state of the system.

===Measurements Revisited===

If the state of a quantum system is described by
{{Equation|<math>
\left\vert \psi\right\rangle = \alpha_0\left\vert 0\right\rangle + \alpha_1\left\vert 1\right\rangle,
</math>|3.18}} 
then the probability of finding it in the state <math>\left\vert 0\right\rangle\,\!</math> when measured in
the computational basis is <math>|\alpha_0|^2\,\!</math>. However, this is a
particular superposition that could be written as
{{Equation|<math>
\left\vert \psi\right\rangle = U \left\vert 0\right\rangle.
</math>|3.19}} 
In the section entitled [[#Schrodinger's Equation|Schrodinger's Equation]] it was shown that this matrix <math>U\,\!</math> results
from the exponentiation of a Hermitian matrix. Recall from the section entitled [[Chapter 2 - Qubits and Collections of Qubits#The Pauli Matrices|The Pauli Matrices]] that any <math>2\times 2\,\!</math>
Hermitian matrix can be written in terms of the Pauli matrices. To make this explicit using standard conventions,
{{Equation|<math>\begin{align}
\left\vert \psi\right\rangle &= U\left\vert 0\right\rangle \\
&= \exp(-i\vec{n}\cdot\vec{\sigma} \theta) \left\vert 0\right\rangle \\
&= (\mathbb{I}\cos(\theta) -i\vec{n}\cdot\vec{\sigma} \sin(\theta))\left\vert 0\right\rangle,
\end{align}
</math>|3.20}}
where <math>\vec{n}\,\!</math> is a unit vector, <math>|\vec{n}|=1\,\!</math> and <math>\vec{n}\cdot\vec{\sigma} =
n_1\sigma_1+n_2\sigma_2+n_3\sigma_3\,\!</math>. (For a proof of this, see [[Appendix C - Vectors and Linear Algebra#Transformations of a Qubit|Section C.5.1]].) One can write this matrix out explicitly,
{{Equation|<math>\begin{align}
\exp(-i\vec{n}\cdot\vec{\sigma} \theta) &= \left(\begin{array}{cc}
1 & 0 \\
0 & 1 \end{array}\right)\cos(\theta) \\
& \;\;\; + (-i)\left[ n_1\left(\begin{array}{cc}
0 & 1 \\
1 & 0 \end{array}\right)
+ n_2\left(\begin{array}{cc}
0 & -i \\
i & 0 \end{array}\right)
+ n_3\left(\begin{array}{cc}
1 & 0 \\
0 & -1 \end{array}\right)\right]\sin(\theta) \\
&=
\left(\begin{array}{cc}
\cos(\theta) -in_3\sin(\theta) & (-in_1-n_2)\sin(\theta) \\
(-in_1+n_2)\sin(\theta) & \cos(\theta) +in_3\sin(\theta) \end{array}\right).
\end{align}
</math>|3.21}}
Notice this is a ''special unitary matrix.'' (See [[Appendix C - Vectors and Linear Algebra]], in particular the subsection [[Appendix C - Vectors and Linear Algebra#Unitary Matrices|Unitary Matrices]].)

To see that any state <math>\left\vert \psi\right\rangle\,\!</math> for arbitrary coefficients
<math>\alpha_0\,\!</math>, <math>\alpha_1\,\!</math> can be obtained by choosing <math>\vec{n}\,\!</math> and <math>\theta\,\!</math>
appropriately, the state <math>\left\vert 0\right\rangle\,\!</math> can be chosen as a starting point.
Then,
{{Equation|<math>\begin{align}
U\left\vert 0\right\rangle &= \left(\begin{array}{cc}
\cos(\theta) -in_3\sin(\theta) & (-in_1-n_2)\sin(\theta) \\
(-in_1+n_2)\sin(\theta) & \cos(\theta) +in_3\sin(\theta)
\end{array}\right)
\left(\begin{array}{c} 1 \\ 0\end{array}\right) \\
&= \left(\begin{array}{c}
\cos(\theta) -in_3\sin(\theta) \\
(-in_1+n_2)\sin(\theta)\end{array}\right).
\end{align}
</math>|3.22}}
For example, choosing <math>\theta=0\,\!</math> gives the original state; choosing
<math>\vec{n} = (0,1,0)\,\!</math> and <math>\theta = \pi/2\,\!</math> gives <math>\left\vert 1\right\rangle\,\!</math>; and choosing
<math>\vec{n} = (0,1,0)\,\!</math> and <math>\theta = \pi/4\,\!</math> gives an equal superposition.
In general, when the system is in the state <math>\left\vert \psi\right\rangle = U\left\vert 0\right\rangle\,\!</math>,
the probability of finding the state <math>\left\vert 0 \right\rangle \,\!</math> when a measurement is made in the computational basis is given by
{{Equation|<math>\begin{align}
|\left\langle 0\right\vert U\left\vert 0\right\rangle|^2 &= |\cos(\theta) -in_3\sin(\theta)|^2 \\
&= \cos^2(\theta) +n_3^2\sin^2(\theta),
\end{align}
</math>|3.23}}
and the probability of finding <math>\left\vert 1\right\rangle\,\!</math> is
{{Equation|<math>\begin{align}
|\left\langle 1\right\vert U\left\vert 1\right\rangle|^2 &= |(-in_1+n_2)\sin(\theta)|^2 \\
&= (n_1^2+n_2^2)\sin^2(\theta).
\end{align}
</math>|3.24}}
Notice that the probabilities add up to one if <math>\vec{n}\,\!</math> is a unit vector.

What this shows is that there is a transformation that takes the state
<math>\left\vert 0\right\rangle\,\!</math>, which has probability <math>1\,\!</math> of being in the state <math>\left\vert 0\right\rangle\,\!</math> and
probability <math>0\,\!</math> of being in the state <math>\left\vert 1\right\rangle\,\!</math>, and transforms it
(using a <nowiki>"rotation''</nowiki>) into a state with a different (and generic)
probability of each. This means that the density matrix corresponding
to this system always has determinant zero, meaning that(for a two-state system) it has one
eigenvalue 1 and another eigenvalue 0. (The determinant is the
product of the eigenvalues.)

===Density Matrix for Mixed States===

For a system with <math>D\,\!</math> dimensions, a ''mixed state density matrix''
(or density operator, see [[Appendix E - Density Operator: Extensions|Appendix E]])  is a matrix that is used to
describe a more general state of a quantum system. This can be written as
{{Equation|<math>
\rho_D = \sum_i a_i \rho_i
\,\! </math>|3.25}} 
where <math>a_i\geq 0\,\!</math>, <math>{\sum}_i a_i=1\,\!</math>, and the <math>\rho_i\,\!</math> are pure states. There is also a generalization of the Bloch sphere which is described in [[Appendix E - Density Operator: Extensions|Appendix E]].

Mixed state  density matrices are important in all descriptions of physical implementations of quantum information processing. For this reason, a bit of labor should go into understanding the density matrix. The rest of this section is devoted to the physical interpretation and properties of this description of a quantum system. The first description presented is called the ensemble interpretation of the density matrix. This is perhaps the easiest to understand. Another set of physical systems that are described by density matrices will be given elsewhere.

====General Properties====

In general, a density matrix has the following properties:
{{Equation|<math>\begin{align}
\rho = \rho^\dagger, &\;\;\; \mbox{it is hermitian}, \\
\rho \geq 0,\; &\;\;\; \mbox{it is positive semi-definite},
\\
\mbox{Tr}(\rho) = 1,\; &\;\;\; \mbox{it is normalized}.
\end{align}
\,\!</math>|3.26}}
If, in addition, it is a pure state, then
{{Equation|<math>
\rho^2 = \rho.
\,\!</math>|3.27}} 
The second property in Eq.[[#eq3.26|(3.26)]] really means that the eigenvalues of the density matrix are greater than or equal to zero.

====Density Matrix for a Mixed State: Two States====

A mixed state density matrix for a two-state system is a rank two density matrix, <math>\rho_m\,\!</math>, which can be described by
{{Equation|<math>
\rho_m = \left[a_1\rho_1 + a_2\rho_2\right],
</math>|3.28}} 
where <math>\rho_1 = \left\vert\psi_1\right\rangle\left\langle \psi_1\right\vert\,\!</math>, <math>\rho_2 = \left \vert \psi_2\right\rangle\left\langle \psi_2\right\vert \,\!</math>
and <math>a_1 + a_2=1\,\!</math>. The <math>a_i\,\!</math> are probabilities and must sum to one.
(Note, if <math>\left\vert \psi\right\rangle_1=\left\vert \psi\right\rangle_2\,\!</math>, or if one <math>a_i\,\!</math> or one
<math>\left\vert \psi\right\rangle_i\,\!</math> is zero, then this reduces to a pure state.) In this mixture,
the probability of finding the state <math>\left\vert \psi_1\right\rangle\,\!</math> is <math>a_1\,\!</math>
and the probability of finding the state <math>\left\vert \psi_2\right\rangle\,\!</math> is <math>a_2\,\!</math>.

====Description of Open Quantum Systems: An Example====

One example of the utility of a density matrix is the following
statistical problem. Let us consider a collection of electrons in a box, where their
spin is a two-state system being either up or down when measured. If
a subset of these electrons is prepared in the state <nowiki>''up''</nowiki> before
being put in the box and the rest <nowiki>''down,''</nowiki> then the description of
the system of particles is given by
{{Equation|<math>
\rho = a_u \left\vert\uparrow\right\rangle\left\langle\uparrow\right\vert +
a_d\left\vert\downarrow\right\rangle\left\langle\downarrow\right\vert,
</math>|3.29}} 
where the fraction of <nowiki>''up''</nowiki> particles is <math>a_u\,\!</math> and the fraction of <nowiki>''down''</nowiki> is <math>a_d\,\!</math>. Our system is described by this density matrix---if a particle is chosen at random from the box and measured, the state of the particle is <math>\left\vert \uparrow\right\rangle\,\!</math> with probability <math>a_u\,\!</math>
and <math>\left\vert \downarrow\right\rangle\,\!</math> with probability <math>a_d\,\!</math>. This is known as the "statistical
interpretation" of the density operator.

There is another example that is more relevant for our purposes. Let us consider another two-state system.
If there is some probability <math>p\,\!</math> for an error to occur, let us say it is a unitary operator <math>U_e\,\!</math>, then the density matrix for the
system is
{{Equation|<math>
\rho_e = (1-p)\left\vert\psi\right\rangle\left\langle\psi\right\vert + pU_e\left\vert\psi\right\rangle\left\langle\psi\right\vert U_e^\dagger.
</math>|3.30}} 
This is the same form as Eq.[[#eq3.31|(3.31)]].

Note that in each
case the probabilities associated with the density matrix <math>p,1-p\,\!</math>, and
<math>a_u,a_d\,\!</math>, (generally, <math>a_i\,\!</math>) are classical probabilities;
they are associated with a classical probability distribution---the
probability for error/no error and up/down. These are not
probabilities associated with the superposition of the quantum state
in the equation <math>\left\vert \psi\right\rangle = \alpha_0 \left\vert 0\right\rangle + \alpha_1\left\vert 1\right\rangle\,\!</math>
given by the square of the moduli of the coefficients. This is an
important distinction! The state
<math>\left\vert \psi\right\rangle\,\!</math> can be taken to the state <math>\left\vert 0\right\rangle\,\!</math> with a unitary
transformation. This state is deterministic in the sense that the
result <math>\left\vert 0\right\rangle\,\!</math> will be obtained from a measurement in the
computational basis since there is no probability for obtaining
<math>\left\vert 1\right\rangle\,\!</math>. However, for nonzero <math>\left\vert \psi\right\rangle\,\!</math> and a non-identity
operator <math>U\,\!</math>, the matrix <math>\rho_e\,\!</math> has rank two and thus can never have
probability <math>1\,\!</math> for either of the two states, <math>\left\vert 0\right\rangle\,\!</math> or <math>\left\vert 1\right\rangle\,\!</math>.
Thus we have maximum knowledge about a pure state since
there is a way to choose a measurement, perhaps after a unitary
transformation, which achieves a certain result with probability <math>1\,\!</math>.
For the mixed state density operator this is not possible. The state
{{Equation|<math>
\rho = \left(\begin{array}{cc}
1/2 & 0 \\
0 & 1/2 \end{array}\right),
</math>|3.31}} 
for which we have the least amount of knowledge, is called the
maximally mixed state.  The
state could be either up or down with equal probability and neither is
a better guess. If the two eigenvalues are not equal, then there is a
better guess (or bet) as to the result of a measurement. If one
eigenvalue is zero, then there is a definite best guess.

To be more specific, independent of basis (unitary transformations),
one always has a probability greater than zero of measuring
<math>\left\vert \uparrow\right\rangle\,\!</math> and probability greater than zero of measuring
<math>\left\vert \downarrow\right\rangle\,\!</math>. Thus the state described by the density matrix is
a ''mixed state''  in the sense
that it can be considered a statistical mixture of the two states
<math>\left\vert \uparrow\right\rangle\,\!</math> and <math>\left\vert \downarrow\right\rangle\,\!</math>. This, because classical
probabilities are included separately, is significantly different from
the pure state density matrix, which is a special case of all density
matrices.

To see that mixtures remain after a unitary transformation on the
system, note that a unitary matrix does not change the eigenvalues.
This is because the eigenvalue equation is the same for a Hermitian
matrix and its corresponding diagonal matrix. Let <math>\rho =
U\rho_d U^\dagger\,\!</math>. It can now be seen,
{{Equation|<math>\begin{align}
\det(\rho -\lambda\mathbb{I}) &=
\det(U(\rho_d-\lambda\mathbb{I})U^\dagger) \\
&=
\det(U)\det(\rho_d-\lambda\mathbb{I})\det(U^\dagger) \\
&=
\det(U)\det(\rho_d-\lambda\mathbb{I})\det(U^{-1}) \\
&= \det(\rho_d-\lambda\mathbb{I}).
\end{align}
</math>|3.32}}

====Two-State Example: Bloch Sphere====

Since our interest is primarily in qubits, which are two-state
systems, we return to a two-state example.

A very convenient representation of two state density matrices, one
that can written in the so-called Bloch sphere 
representation given the fact that the density matrix is Hermitian,
{{Equation|<math>
\rho_2 = \frac{1}{2}(\mathbb{I} + \vec{n}\cdot\vec{\sigma}),
</math>|3.33}} 
where, for the density matrix to be positive <math>|\vec{n}| \leq 1\,\!</math>, and the
<math>\sigma_i\,\!</math> are the Pauli matrices 
{{Equation|<math>
\vec{\sigma} = (\sigma_x,\sigma_y,\sigma_z) = \left(
\left(\begin{array}{cc}
0 & 1 \\
1 & 0 \end{array}\right),
\left(\begin{array}{cc}
0 & -i \\
i & 0 \end{array}\right),
\left(\begin{array}{cc}
1 & 0 \\
0 & -1 \end{array}\right)
\right).
</math>|3.34}} 
The matrix entries on the RHS of this equation are the [[Chapter 2 - Qubits and Collections of Qubits#The Pauli Matrices|The Pauli matrices]] discussed above. It is not difficult to convince yourself that any Hermitian matrix can be written as a real linear combination of the three Pauli matrices and the identity. The eigenvalues are given by
{{Equation|<math>
\lambda_\pm = \frac{1\pm|\vec{n}|}{2}.
</math>|3.35}} 
When <math>|\vec{n}| = 1\,\!</math>, the state is pure, i.e., that the matrix
has rank one since it has one eigenvalue one and one zero. If <math>|\vec{n}|
< 1\,\!</math>, the density matrix represents a mixed state since rank is
greater than one--there are two non-zero eigenvalues. These leads to
the following picture: the pure states lie on the surface of the
sphere (<math>\vec{n}\cdot \vec{n} =1\,\!</math>), and mixed states lie in the interior of
the sphere with the maximally mixed state at the origin. This is
supposedly due to Bloch. Hence the name Bloch sphere.

Using <math>\rho^2 =\rho\,\!</math> the condition that <math>\vec{n}\cdot\vec{n} =1\,\!</math> for a pure
state can also be determined. The square in the Bloch sphere
representation yields
{{Equation|<math>
\rho_2^2 = \frac{1}{4}\left(\mathbb{I} + 2\vec{n}\cdot\vec{\sigma} + (\vec{n}\cdot\vec{\sigma})^2\right),
</math>|3.36}} 
and using
{{Equation|<math>
\sigma_i \sigma_j = \mathbb{I}\delta_{ij} + i\epsilon_{ijk}\sigma_k,
</math>|3.37}} 
then <math>\rho_2^2 =\rho_2\,\!</math> if and only if <math>\vec{n}\cdot\vec{n} =1\,\!</math>. This technique is
used for higher dimensions. See [[Appendix E - Density Operator: Extensions|Appendix E]].

Two density matrices <math>\rho_1=(1/2)(\mathbb{I} +\vec{n}\cdot\vec{\sigma})\,\!</math> and
<math>\rho_2=(1/2)(\mathbb{I} +\vec{m}\cdot\vec{\sigma})\,\!</math>, correspond to orthogonal
states when
{{Equation|<math>\begin{align}
\mbox{Tr}(\rho_1\rho_2) &= \frac{1}{4}\mbox{Tr}\big(\mathbb{I} + (\vec{n}\cdot\vec{\sigma})(\vec{m}\cdot\vec{\sigma})\big) \\
&= \frac{1}{2}(1+\vec{n}\cdot\vec{m}) =0.
\end{align}</math>|3.38}} 
This implies that
{{Equation|<math>
\vec{n}\cdot\vec{m} = |\vec{n}||\vec{m}|\cos(\theta) = -1.
</math>|3.39}} 
Since the magnitudes must be one, the orthogonal states correspond to
pure states on a surface of a sphere which are represented by
antipodal points.

====Rotations of Bloch Vectors====

As shown above, the solution to the Schrodinger equation for the density operator is (see [[#eq3.11|Eq.(3.11)]])
<center><math>
\rho(t) = U(t)\rho(0) U^\dagger(t).
\,\!</math></center>
In general an open system will evolve according to
<center><math>
\rho = U \rho_0 U^\dagger,
\,\!</math></center>
whether or not the time dependence is explicitly taken into account. When the density operator is represented using the Bloch vector, the vector is rotated by the unitary transformation. This is seen through an explicit calculation.

There are two ways to see this. One is to simply act with the matrices in the Euler angle parameterization in [[Appendix C - Vectors and Linear Algebra#Transformations of a Qubit|Section C.5.1]] one each of the Pauli matrices to show that indeed,
{{Equation|<math>
U\vec{m}\cdot\vec{\sigma} U^\dagger = \sum_{ij} m_i R_{ij} \sigma_j.
\,\!</math>|3.40}}
This is easily seen to be a standard rotation matrix. (See for example http://en.wikipedia.org/wiki/Rotation_matrix.)

Another way to do this is to take
<center><math>
\rho_0 = \frac{1}{2}(\mathbb{I} + \vec{m}\cdot\vec{\sigma}),
\,\!</math></center>
as in [[#eq3.33|Eq.(3.33)]]. (Recall <math>\vec{m}\cdot\vec{\sigma} = \sum\!{}_i \; m_i \sigma_i\,\!</math>.) Now act on <math>\rho</math> with <math>U\,\!</math> as given in [[Appendix C - Vectors and Linear Algebra#Transformations of a Qubit|Section C.5.1]] by the so-called adjoint action <math>\rho = U \rho_0 U^\dagger\,\!</math>,
{{Equation|<math>
\rho = [\mathbb{I}\cos(\theta/2)-i\vec{n}\cdot\vec{\sigma}\sin(\theta/2)]\frac{1}{2}(\mathbb{I} + \vec{m}\cdot\vec{\sigma})[\mathbb{I}\cos(\theta/2)+i\vec{n}\cdot\vec{\sigma}\sin(\theta/2)].
\,\!</math>|3.41}}
To do this calculation explicitly, it helps (but is not necessary) to use the following identity,
{{Equation|<math>
\sum_i \epsilon_{ijk}\epsilon_{ilm} = \delta_{jl}\delta_{km} - \delta_{jm}\delta_{kl}.
\,\!</math>|3.42}}
Then, if one only considers the non-trivial part of the density operator, <math>\vec{m}\cdot\vec{\sigma}\,\!</math>, the result is
{{Equation|<math>
e^{-i\vec{n}\cdot\vec{\sigma}\theta/2} \vec{m}\cdot\vec{\sigma} e^{i\vec{n}\cdot\vec{\sigma}\theta/2}
= \vec{m}\cdot\vec{\sigma} \cos(\theta) + (\vec{n}\cdot\vec{m}) (\vec{n}\cdot\vec{\sigma})(1-\cos(\theta)) + (\vec{n}\times \vec{m})\cdot\vec{\sigma}\sin(\theta),
\,\!</math>|3.43}}
or
{{Equation|<math>\begin{align}
e^{-i\vec{n}\cdot\vec{\sigma}\theta/2} \vec{m}\cdot\vec{\sigma} e^{i\vec{n}\cdot\vec{\sigma}\theta/2}
&= \frac{1}{2}\vec{m}\cdot\vec{\sigma} \cos(\theta) + \frac{1}{2}(\vec{n}\cdot\vec{m}) (\vec{n}\cdot\vec{\sigma})\cos(\theta) + (\vec{n}\cdot\vec{m}) (\vec{n}\cdot\vec{\sigma}) \\
& \;\;\;\; + (\vec{n}\times \vec{m})\cdot\vec{\sigma}\sin(\theta)
+\frac{1}{2}[(\vec{n}\times\vec{m})\times\vec{n}]\cdot\vec{\sigma}\cos(\theta)\end{align}
\,\!</math>|3.44}}
where
{{Equation|<math>
(\vec{n}\times \vec{m})\cdot\vec{\sigma} = \sum_{ijk} \epsilon_{ijk} n_im_j\sigma_k.
\,\!</math>|3.45}}
Therefore, the result of the action of <math>U\,\!</math> is to produce, from <math>\vec{m}\,\!</math>, the vector
{{Equation|<math>
\vec{m}^\prime= \vec{m} \cos(\theta) + (\vec{n}\cdot\vec{m})\vec{n}(1-\cos(\theta)) + (\vec{n}\times \vec{m})\sin(\theta).
\,\!</math>|3.46}}
This equation can be interpreted as follows. We consider three components of the vector, the part along the axis of rotation and the two parts in the plane perpendicular to the axis of rotation. The part of the vector along the axis of rotation <math>(\vec{m}\cdot\vec{n})\vec{n}\,\!</math> does not change. The parts perpendicular to <math>(\vec{m}\cdot\vec{n})\vec{n}\,\!</math> change just like a vector rotated in a plane, but these parts are rotated in the plane perpendicular to the rotation axis and sitting at the end of the vector <math>(\vec{m}\cdot\vec{n})\vec{n}\,\!</math>. It takes a bit of geometry and vector algebra to show this is the case.

===Expectation Values===

The expectation value 
of an operator <math>\mathcal{O}\,\!</math>, is given by
{{Equation|<math>
\langle \mathcal{O} \rangle = \mbox{Tr}(\rho \mathcal{O}),
</math>|3.47}} 
and is the "average value" of the operator. For a pure state
<math>\rho_p = \left\vert\psi\right\rangle\left\langle\psi\right\vert\,\!</math>, this reduces to
{{Equation|<math>
(\langle \mathcal{O} \rangle)_p = \left\langle\psi\right\vert \mathcal{O}\left\vert \psi\right\rangle.
</math>|3.48}} 

[[Chapter 4 - Entanglement#Introduction|Continue to '''Chapter 4 - Entanglement''']]

Appendix A - Basic Probability Concepts

2013-03-03T22:27:13Z

Ddghunter: grammar

In this appendix definitions and some example calculations are
presented which will aid in our discussions. This is not meant to be
a comprehensive introduction to the topic. It is primarily meant to
serve as a means for introducing notation and terminology for the
course.

By definition, probability is the chance of a certain event occurring from a set of events that could possibly occur. Let us start with the most primitive example of probability, flipping a coin. Now, we know the set of possible outcomes is heads or tails, <math>S=\left\{H,T\right\}.\,\!</math> Since there are only two events that can occur and we know that there is an equal chance for them both to occur, we say that the probability for each occurring is <math>1/2,\,\!</math> i.e. <math>P(H)=1/2\,\!</math> and <math>P(T)=1/2,\,\!</math> because the probabilities for every possible outcome of an event must equal <math>1,\,\!</math> i.e. <math>P(H)+P(T)=1.\,\!</math>

In probability, the Boolean operator '''''and''''' can be somewhat counter intuitive at first. For instance, if someone were to tell you that he/she has '''5''' apples '''''and''''' just received '''3''' more, the operation that takes place in your head is he/she has <math>5 + 3 = 8\,\!</math> apples. But, when working with probabilities, the Boolean '''''and''''' corresponds with multiplication. For example, say the probability that Bob stays and works through his lunch hour is <math>1/6\,\!</math> and the probability that Kathy stays and works through lunch is <math>5/6.\,\!</math> Now if I were to ask, "What is the probability that Bob '''''and''''' Kathy stay and work through lunch?", you would not want add the probabilities because <math>P(B)+P(K)=1.\,\!</math> This would imply that both will work through lunch, which doesn't make sense because we cannot guarantee, from the knowledge that we have, both will work through lunch. Instead, let us multiply their respective probabilities, <math>P(B)*P(K)=5/36.\,\!</math> Since the answer is lower than the probability for each individual, it makes much more sense because, intuitively, the more uncertainty (i.e. more probabilities < 1) in a system, the more uncertain we are of success.

Now that we have examined the Boolean '''''and''''', lets take a look at '''''or'''''. '''''Or''''' corresponds with addition, which follows directly from the condition that all probabilities for the outcomes of events must add up to <math>1.\,\!</math> Revisiting the example of flipping a coin, we see that the two possible outcomes that occur are you obtain heads '''''or''''' you obtain tails. <math>P(H)+P(T)=1.\,\!</math>

(This example is a variation of one given by David Griffiths in ''Introduction to Quantum Mechanics'' ([[Bibliography#Griffiths:qmbook|David J. Griffiths’ book [4]]]))

''Example'': Suppose that in some room, there are four people with the following heights:
#1 person is '''1.5''' meters tall
#1 person is '''1.6''' meters tall
#2 people are '''1.8''' meters tall
Let<math>N\,\!</math> stand for the total number of people. We might write the number of people with certain heights as
<math>N(1.5) = 1\,\!</math>, <math>N(1.6)=1\,\!</math>, <math>N(1.8)=2\,\!</math>.
<center>The total number of people is</center>
<center><math>
N = \sum_{j=0}^\infty N(j),
\,\!</math></center>
<center>where <math>j\,\!</math> runs over all values. It is easily seen that <math>N=4\,\!</math>.</center>

Now if I draw a name out of a hat that contains each person's name
once, I will get the name of a person who is 1.6 meters tall with
probability <math>1/4\,\!</math>. (We assume that each person has a unique name and
that it appears once and only once in the hat.) We write this as
<center><math>
P(1.6) = 1/4
\,\!</math></center>
and we would generally write for any value
<center><math>
P(j) = \frac{N(j)}{N}.
\,\!</math></center>
Now since we are going to get someone's name when we draw, we must
have
<center><math>
\sum_j P(j) = 1,
\,\!</math></center>
which is easy enough to check.

There are several aspects of this probability distribution that we might like to know. Here are some that are particularly useful: 
#The ''most probable'' value (or ''mode'') for the height is 1.8 meters.
#The ''median'' is 1.7 meters (two people above and two below).
#The ''average'' (or ''mean'') is given by
{{Equation|<math>\begin{align}\left\langle height\right\rangle &= \frac{1(1.5)+1(1.6)+2(1.8)}{4} \\ &= \frac{6.8}{4} = 1.7. \end{align}</math>|A.1}}
Note that the mean and the median do not have to be the same. If there is an odd number of values, the median is the middle number in the list; if even, it is the mean of the two middle values. It is mere coincidence that they are the same here.
The bracket, <math>\left\langle\cdot\right\rangle\,\!</math>, is the standard notation for finding the ''average value''
of a function. This is done by calculating
<center><math>\left\langle f(j)\right\rangle = \sum_{j=0}^\infty f(j)P(j).\,\!</math></center>
For the average this is just
<center><math>
\left\langle j\right\rangle = \sum_{j=0}^\infty jP(j)= \sum_{j=0}^\infty j\frac{N(j)}{N}. \,\!</math></center>

'''Note:''' The ''average value'' is called the ''expectation value''  in quantum mechanics. This can be
misleading because it is ''not'' the most probable, nor is it <nowiki>''what to expect.''</nowiki>

When one would like to discuss the properties of a particular probability distribution, describing it takes some effort. It is not enough to know the average, median, and most probable values; a lot of details of the probability distribution remain unknown to us if these are all we are given. What else would one like to know? Without describing it entirely, one may like to know more about the <nowiki>''shape''</nowiki> of the distribution. For example, how spread out is it?

The most important measure of this is the ''variance'', which is the ''standard deviation''  squared ( <math>\sigma^2\!</math> ). The variance is defined as (in terms of our variable <math>j\,\!</math>)
{{Equation|<math>\sigma^2 = \langle(\Delta j)^2\rangle, \,\!</math>|A.2}}
where <math>\Delta j = j -\langle j \rangle\,\!</math>. This can also be written as
{{Equation|<math>\sigma^2 = \langle j^2\rangle - \langle j \rangle^2.\,\!</math>|A.3}}

===Stirling's Formula===

For large <math>n \,\!</math>, the following approximation is quite useful:
<center><math>
n! \approx \sqrt{2\pi n} \; n^n e^{-n}.
\,\!</math></center>

Appendix F - Classical Error Correcting Codes

2013-03-03T21:31:20Z

Ddghunter: /* Parity Check Matrix */ Clarifying equations, correcting redundency

===Introduction===

Classical error correcting codes are in use in a wide variety of digital electronics and other classical information systems. It is a good idea to learn some of the basic definitions, ideas, methods, and simple examples of classical error correcting codes in order to understand the (slightly) more complicated quantum error correcting codes. There are many good introductions to classical error correction. Here we follow a few sources which also discuss quantum error correcting codes: the book by [[Bibliography#LoeppWootters|Loepp and Wootters [25]]], an article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane, [[Bibliography#GottDiss|Gottesman's Thesis [27]]], and [[Bibliography#Gaitan:book|Gaitan's Book [3]]] on quantum error correction, which also discusses classical error correction.

===Binary Operations===

The set <math> \{0,1\} \,\!</math> is a group under addition. (See [[Appendix D - Group Theory#Example 3|Section D.2.8]] of [[Appendix D - Group Theory|Appendix D]].) The way this is achieved is by deciding that we will only use these two numbers in our language and using addition modulo 2, meaning <math> 0+0=0, 1+0 = 0+1 = 1, \,\!</math> and <math>1+1 =0\,\!</math>. If we also include the operation of multiplication and these two operations follow the distributive law, the set becomes a '''field''' (a Galois Field), which is denoted GF<math>(2)\,\!</math>. Since one often works with strings of bits, it is very useful to consider the string of bits to be a vector and to use vector addition (which is component-wise addition) and vector multiplication (which is the inner product). For example, the addition of the vector <math>(0,0,1)\,\!</math> and <math>(0,1,1)\,\!</math> is <math>(0,0,1) + (0,1,1) = (0,1,0)\,\!</math>. The inner product between these two vectors is <math>(0,0,1) \cdot (0,1,1) = 0\cdot 0 + 0\cdot 1 + 1\cdot 1 = 0 +0 +1=1\,\!</math>.

===Definitions and Basics===

====Definition 1====
The inner product is also called a '''checksum''' or '''parity check''' since it shows whether or not the first and second vectors agree, or have an even number of 1's at the positions specified by the ones in the other vector. We may say that the first vector satisfies the parity check of the other vector, or vice versa.

====Definition 2====
The '''weight''' or '''Hamming weight''' is the number of non-zero components of a vector or string. The weight of a vector <math>v\,\!</math> is denoted wt(<math>v\,\!</math>).

====Definition 3====
The '''Hamming distance''' is the number of places where two vectors differ. Let the two vectors be <math>v\,\!</math> and <math>w\,\!</math>. Then the Hamming distance is also equal to wt(<math>v+w\,\!</math>). The Hamming distance between <math>v\,\!</math> and <math>w\,\!</math> will be denoted <math>d_H(v,w)\,\!</math>.

====Definition 4====
We use <math>\{0,1\}^n\,\!</math> to denote the set of all binary vectors of length <math>n\,\!</math>. A '''code''' <math>C\,\!</math> of length <math>n\,\!</math> is any subset of that set. The set of all elements of <math>C\,\!</math> is called the set of '''codewords'''. We also say there are <math>2^n\,\!</math> <math>n\,\!</math>-bit words in the space.

Suppose <math>n\,\!</math> bits are used to encode <math>k\,\!</math> logical bits. We use the notation <math>[n,k] \,\!</math> do denote such a code.

====Definition 5====
The '''minimum distance''' of a code is the smallest Hamming distance between any two non-equal vectors in a code. This can be written
{{Equation|<math>
d_{Hmin}(C) = \underset{v,w\in C,v\neq w}{\mbox{min}}d_H(v,w).
\,\!</math>|F.1}}
For shorthand, we also use <math> d(C)\,\!</math> or <math> d\,\!</math> if <math> C\,\!</math> is understood.

When that code has a distance <math>d\,\!</math>, the notation <math>[n,k,d] \,\!</math> is used.

====Example 1====
It is interesting to note that if we encode redundantly using <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math> as our logical zero and logical one respectively, then we could detect single bit errors but not correct them. For example, if we receive <math> 01\,\!</math>, we know this cannot be one of our encoded states. So an error must have occurred. However, we don't know whether the sender sent <math> 0_L=00 \,\!</math> or <math>1_L=11\,\!</math>. We do know that an error has occurred though, as long as we know only one error has occurred. Such an encoding can be used as an '''error detecting code'''. In this case there are two code words, <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math>, but four words in the space. The minimum distance is 2, which is the distance between the two code words.

====Example 2====
The three-bit redundant encoding was already given in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]]. One takes logical zero and logical one states to be
{{Equation|<math>
0_L = 000 \;\;\; \mbox{ and } \;\;\; 1_L = 111,
\,\!</math>|F.2}}
where the subscript <math>L \,\!</math> is used to denote a "logical" state; that is, one that is encoded. Recall that this code is able to detect and correct one error. In this case there are two code words out of eight possible words, and the minimal distance is 3.

====Definition 6====
The '''rate''' of a code is given by the ration of the number of logical bits to the number of bits, <math>k/n\,\!</math>.

====Definition 7====
A '''linear code''' <math>C_l\,\!</math> is a code that is closed under addition.

===Linear Codes===

Linear codes are particularly useful because they are able to efficiently identify errors and the associated correct codewords. This ability is due to the added structure these codes have.

====Generator Matrix====

For linear codes, any linear combination of codewords is a codeword. One key feature of a linear code is that it can be specified by a <nowiki>''generator matrix,''</nowiki> <math>G\,\!</math><ref>Recall that we are working with binary codes. Thus the entries of the matrix will also be binary numbers, i.e., 0's and 1's.</ref>. For an <math> [n,k]\,\!</math> code, the '''generator matrix''' is an <math> n\times k\,\!</math> matrix with columns that form a basis for the <math>k\,\!</math>-dimensional coding sub-space of the <math>n\,\!</math>-dimensional binary vector space. In other words, the vectors comprising the rows form a basis that will span the code space. (Note that one may also use the transpose of this matrix as the definition for <math>G\,\!</math>.) Any code word <math>w\,\!</math> described by a vector <math>v\,\!</math> can be written in terms of the generator matrix as <math>w = Gv\,\!</math>. Note that <math>G\,\!</math> is independent of the input and output vectors. In addition, <math>G\,\!</math> is not unique. If columns are switched or even added to produce a new vector that replaces a column, then the generator matrix is still valid for the code. This is due to the requirement that the columns be linearly independent, which is still satisfied if these operations are performed.

====Parity Check Matrix====
Once <math>G\,\!</math> is obtained, one can calculate another useful matrix, <math>P.\,\!</math> <math>P\,\!</math> is an <math>(n- k)\times n\,\!</math> matrix which has the property that
{{Equation|<math>
PG = 0.
\,\!</math>|F.3}}
The matrix <math>P\,\!</math> is called the '''parity check matrix''' or '''dual matrix'''. The rank of <math>P\,\!</math> is at most <math>n- k\,\!</math> and has the property that it annihilates any code word. To see this, recall any code word <math>w\,\!</math> can be written as <math>Gv.\,\!</math> Therefore, <math>Pw=PGv =0\,\!</math> since <math>PG =0.\,\!</math> That is to say, <math>Pw=0\,\!</math> if and only if <math>w\,\!</math> is a code word. This means that <math>P\,\!</math> can be used to test whether or not a word is in the code.

Suppose an error occurs on a code word <math>w\,\!</math> to produce <math>w^\prime = w + e\,\!</math>. It follows that
{{Equation|<math>
Pw^\prime = P(w+e) = Pe,
\,\!</math>|F.4}}
since <math>Pw=0\,\!</math>. This result, <math>Pe\,\!</math> is called the '''error syndrome''' and the measurement to identify <math>Pe\,\!</math> is the '''syndrome measurement'''. Therefore, the result depends only on the error and not on the original code word. If the error can be determined from this result, then it can be corrected independent of the code word. However, in order to have <math>Pe\,\!</math> be unique, two different results, <math>Pe_1\,\!</math> and <math>Pe_2\,\!</math>, must not be equal. This is possible if a distance <math>d\,\!</math> code is constructed such that the parity check matrix has <math>d-1=2t\,\!</math> linearly independent columns. This enables the errors to be identified and corrected.

It is important to emphasize that these two matrices define the code as well as the check and necessary recovery operations. The matrix <math>G\,\!</math> is determined by the code. Once this matrix is determined, there is a method for determining the parity check matrix, <math>P\,\!</math> which is a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. It is possible to determine the parity matrix from the generator matrix. The method for doing this can be found in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]] and it goes as follows. One first puts <math>G^T\,\!</math> in the form of an augmented matrix <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math>.

===Errors===

For any classical error correcting code, there are general conditions that must be satisfied in order for the code to be able to detect and correct errors. The two examples above show how the error can be detected; here, the objective is to give some general conditions.

Note that any state containing an error may be written as the sum of the original (logical or encoded) state <math>w \,\!</math> and another vector <math>e \,\!</math>. The error vector <math>e \,\!</math> has ones in the places where errors are present and zeroes everywhere else. To ensure that the error may be corrected, the following condition must be satisfied for two states with errors occurring:
{{Equation|<math>
w_1 + e_1 \neq w_2 + e_2.
\,\!</math>|F.5}}
This condition is called the '''disjointness condition'''. This condition means that an error on one state cannot be confused with an error on another state. If it could, then the state including the error could not be uniquely identified with an encoded state and the state could not be corrected to its original state after the error occurred. More specifically, for a code to correct <math>t\,\!</math> single-bit errors, it must have distance at least <math>2t + 1 \,\!</math> between any two codewords; i.e., it must be true that <math>d(C) \geq 2t + 1 \,\!</math>. An <math>[n,k]\,\!</math> code with minimal distance <math>d \,\!</math> is denoted <math>[n,k,d]\,\!</math>.

====Example 3====
An important example of an error correcting code is called the <math>[7,4,3]</math> Hamming code. This code, as the notation indicates, encodes <math>k=4</math> bits of information into <math>n=7</math> bits. It also does it in such a way that one error can be detected and corrected since it has a distance of <math>3</math>. The generator matrix for this code can be taken to be
{{Equation|<math>
G^T = \left(\begin{array}{ccccccc}
1 & 0 & 0 & 0 & 1 & 1 & 0 \\
0 & 1 & 0 & 0 & 1 & 1 & 1 \\
0 & 0 & 1 & 0 & 1 & 0 & 1 \\
0 & 0 & 0 & 1 & 0 & 1 & 1
\end{array}\right).
\,\!</math>|F.6}}
(See for example [[Bibliography#LoeppWootters|Loepp and Wootters [25]]].) From this the parity check matrix, <math>P\,\!</math> can be calculated (as stated above) by finding a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. Alternatively, one could use the method in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]]. Put <math>G^T\,\!</math> in the form <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math> In either case, one can arrive at the following parity check matrix for this code:
{{Equation|<math>
P = \left(\begin{array}{ccccccc}
1 & 1 & 1 & 0 & 1 & 0 & 0 \\
1 & 1 & 0 & 1 & 0 & 1 & 0 \\
0 & 1 & 1 & 1 & 0 & 0 & 1
\end{array}\right).
\,\!</math>|F.7}}
It is useful to note that the code can also be defined by the parity check matrix. Only the codewords are annihilated by the parity check matrix.

===The Disjointness Condition and Correcting Errors===

The motivation for the disjointness condition, [[#eqF.5|Eq.(F.5)]], is to associate each vector in the space with a particular code word. That is, assuming that only certain errors occur, each error vector should be associated to a particular vector in the code space when the error is added to the original code word. This partitions the set into disjoint subsets, with each containing only one code vector. A message is decoded correctly if the vector (the one containing the error) is in the subset that is associated with the original vector (the one with no error). For example, if one vector is sent, say <math> v_1 \,\!</math>, and an error occurs during transmission to produce <math> v_2 = v_1 +e\,\!</math>, then this vector must be in the subset containing <math> v_1 \,\!</math>.

A way to decode is to record an array of possible code words, possible errors, and the combinations of those errors and code words. The array can be set up as a top row of the code word vectors and a leftmost column of errors, with the element of the first row and the first column being the zero vector and all subsequent entries in the column being errors. Then the element at the top of a column (say the jth column) is added to the error in the corresponding row (say the kth row) to get the j,k entry of the array. With this array one can associate a column with a subset that is disjoint with the other sets. Identifying the erred code word in a column associates it with a code word and thus corrects the error.

====Example 4====

In this example we are going to use <math>G\,\!</math> [[#eqF.6| (F.6)]] and <math>P\,\!</math> [[#eqF.7| (F.7)]] from the example above.

The set of code words is given by all of the linear combinations of the rows of <math>P,\,\!</math> meaning there are <math>2^3\,\!</math> code words. The set of code words,
<center><math>C = \left\{0000000, 1110100, 1101010, 0111001, 0100111, 1010011, 0011110, 1001101\right\}.\,\!</math></center>

<center> <div id="TableF.1"><big>'''TABLE F.1'''</big></div>

{| border="1" cellpadding="10" cellspacing="0"
|+ align="bottom" |Table F.1: ''Array to determine possible errors on an unknown code word in set <math>C\,\!</math>''
|-
|<math>0000000\,\!</math>
|<math>1110100\,\!</math>
|<math>1101010\,\!</math>
|<math>0111001\,\!</math>
|<math>0100111\,\!</math>
|<math>1010011\,\!</math>
|<math>0011110\,\!</math>
|<math>1001101\,\!</math>
|-
|<math>1000000\,\!</math>
|<math>0110100\,\!</math>
|<math>0101010\,\!</math>
|<math>1111001\,\!</math>
|<math>1100111\,\!</math>
|<math>0010011\,\!</math>
|<math>1011110\,\!</math>
|<math>0001101\,\!</math>
|-
|<math>0100000\,\!</math>
|<math>1010100\,\!</math>
|<math>1001010\,\!</math>
|<math>0011001\,\!</math>
|<math>0000111\,\!</math>
|<math>1110011\,\!</math>
|<math>0111110\,\!</math>
|<math>1101101\,\!</math>
|-
|<math>0010000\,\!</math>
|<math>1100100\,\!</math>
|<math>1111010\,\!</math>
|<math>0101001\,\!</math>
|<math>0110111\,\!</math>
|<math>1000011\,\!</math>
|<math>0001110\,\!</math>
|<math>1011101\,\!</math>
|-
|<math>0001000\,\!</math>
|<math>1111100\,\!</math>
|<math>1100010\,\!</math>
|<math>0110001\,\!</math>
|<math>0101111\,\!</math>
|<math>1011011\,\!</math>
|<math>0010110\,\!</math>
|<math>1000101\,\!</math>
|-
|<math>0000100\,\!</math>
|<math>1110000\,\!</math>
|<math>1101110\,\!</math>
|<math>0111101\,\!</math>
|<math>0100011\,\!</math>
|<math>1010111\,\!</math>
|<math>0011010\,\!</math>
|<math>1001001\,\!</math>
|-
|<math>0000010\,\!</math>
|<math>1110110\,\!</math>
|<math>1101000\,\!</math>
|<math>0111011\,\!</math>
|<math>0100101\,\!</math>
|<math>1010001\,\!</math>
|<math>0011100\,\!</math>
|<math>1001111\,\!</math>
|-
|<math>0000001\,\!</math>
|<math>1110101\,\!</math>
|<math>1101011\,\!</math>
|<math>0111000\,\!</math>
|<math>0100110\,\!</math>
|<math>1010010\,\!</math>
|<math>0011111\,\!</math>
|<math>1001100\,\!</math>
|}</center>

Now, suppose you are expecting to receive a code word, <math>c\in C.\,\!</math> But, instead you receive <math>0101111\notin C.\,\!</math> What we are able to do is look at Table F.1 and see that <math>0101111\,\!</math> is in column 5. Since the columns of this table represent the disjoint subsets of our code space, we see that <math>c = 0100111\,\!</math> and the error that occurred was <math>e_{4}\,\!</math> or <math>0001000.\,\!</math>

===The Hamming Bound===

The Hamming bound is a bound that restricts the rate of the code. Due to the disjointness condition, a certain number of bits are required to ensure our ability to detect and correct errors. Suppose there is a set of <math> n\,\!</math> bit vectors for encoding <math> k\,\!</math> bits of information. There is a set of error vectors of weight <math> t \,\!</math> that has <math> C(n,t)\,\!</math> elements<ref>That is, <math> n \,\!</math> choose <math> t \,\!</math> vectors. The notation is <math> C(n,t) = {n\choose t} = \frac{n!}{(n-t)!t!}.\,\!</math></ref>. So the number of error vectors, including errors of weight up to <math> t \,\!</math>, is
<math> \sum_{i=0}^t C(n,i). \,\!</math> (Note that no error is also part of the set of error vectors. The objective is to be able to design a code that can correct all errors up to those of weight <math> t \,\!</math>, and this includes no error at all.) Since there are <math> 2^n\,\!</math> vectors in the whole space of <math> n\,\!</math> bits, and assuming <math> m\,\!</math> vectors are used for the encoding, the Hamming bound is
{{Equation|<math>
m\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.8}}
For linear codes, <math> m=2^k,\,\!</math> so
{{Equation|<math>
2^k\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.9}}
Taking the logarithm,
{{Equation|<math>
k \leq n - \log_2\left(\sum_{i=0}^t C(n,i)\right).
\,\!</math>|F.10}}
For large <math> n, k \,\!</math> and <math> t \,\!</math>, we can use [[#LoPopescueSpiller|Stirling's formula]] to show that
{{Equation|<math>
\frac{k}{n} \leq 1 - H\left(\frac{t}{n}\right),
\,\!</math>|F.11}}
where <math> H(x) = -x\log x -(1-x)\log (1-x) \,\!</math> and we have neglected an overall multiplicative constant that goes to 1 as <math> n\rightarrow \infty. \,\!</math> (Again, see the article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane.)

===More Definitions===

====Definition 8: Dual Code====

Let <math>\mathcal{C}\,\!</math> be a code and let <math>v\,\!</math> be a vector in the code space. The '''dual code''', denoted <math>\mathcal{C}^\perp\,\!</math>, is the set of all vectors that have zero inner product with all <math>v\in \mathcal{C}\,\!</math>. In other words, it is the set of all vectors <math>u\,\!</math> such that <math>u\cdot v = 0\,\!</math> for all <math>v\in \mathcal{C}\,\!</math>.

For binary vectors, a vector can be orthogonal to itself. Note that this is different from ordinary vectors in 3-d space.

The dual code is a useful entity in classical error correction and will be used in the construction of the quantum error correcting codes known as [[Chapter 7 - Quantum Error Correcting Codes#CSS codes|CSS codes]].

===Final Comments===

As can be seen from the Hamming bound, there is a limit to the rate of an error correcting code. This does not indicate whether or not codes that satisfy these bounds exist, but it does tell us that no codes exist that do not satisfy these bounds. Encoding, decoding, error detection and correction are all difficult problems to solve in general. One of the advantages of the linear codes is that they provide a systematic method for identifying errors on a code through the use of the parity check operation. More generally, checking to see whether or not a bit string (vector) is in the code space would require a look-up table. This would be much more time-consuming than using the parity check matrix; matrix multiplication is quite efficient relative to the look-up table.

Many of these ideas and definitions will be utilized in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]] on quantum error correction. Some linear codes, including the Hamming code above, will have quantum analogues---as do many quantum error correcting codes. In quantum computers, as will be discussed, error correction is necessary due to the delicacy of quantum information. Such discussions will be taken up in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]].

==Footnotes==
<references />

Appendix F - Classical Error Correcting Codes

2013-03-03T21:20:44Z

Ddghunter: /* Linear Codes */ Correcting Redundency

===Introduction===

Classical error correcting codes are in use in a wide variety of digital electronics and other classical information systems. It is a good idea to learn some of the basic definitions, ideas, methods, and simple examples of classical error correcting codes in order to understand the (slightly) more complicated quantum error correcting codes. There are many good introductions to classical error correction. Here we follow a few sources which also discuss quantum error correcting codes: the book by [[Bibliography#LoeppWootters|Loepp and Wootters [25]]], an article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane, [[Bibliography#GottDiss|Gottesman's Thesis [27]]], and [[Bibliography#Gaitan:book|Gaitan's Book [3]]] on quantum error correction, which also discusses classical error correction.

===Binary Operations===

The set <math> \{0,1\} \,\!</math> is a group under addition. (See [[Appendix D - Group Theory#Example 3|Section D.2.8]] of [[Appendix D - Group Theory|Appendix D]].) The way this is achieved is by deciding that we will only use these two numbers in our language and using addition modulo 2, meaning <math> 0+0=0, 1+0 = 0+1 = 1, \,\!</math> and <math>1+1 =0\,\!</math>. If we also include the operation of multiplication and these two operations follow the distributive law, the set becomes a '''field''' (a Galois Field), which is denoted GF<math>(2)\,\!</math>. Since one often works with strings of bits, it is very useful to consider the string of bits to be a vector and to use vector addition (which is component-wise addition) and vector multiplication (which is the inner product). For example, the addition of the vector <math>(0,0,1)\,\!</math> and <math>(0,1,1)\,\!</math> is <math>(0,0,1) + (0,1,1) = (0,1,0)\,\!</math>. The inner product between these two vectors is <math>(0,0,1) \cdot (0,1,1) = 0\cdot 0 + 0\cdot 1 + 1\cdot 1 = 0 +0 +1=1\,\!</math>.

===Definitions and Basics===

====Definition 1====
The inner product is also called a '''checksum''' or '''parity check''' since it shows whether or not the first and second vectors agree, or have an even number of 1's at the positions specified by the ones in the other vector. We may say that the first vector satisfies the parity check of the other vector, or vice versa.

====Definition 2====
The '''weight''' or '''Hamming weight''' is the number of non-zero components of a vector or string. The weight of a vector <math>v\,\!</math> is denoted wt(<math>v\,\!</math>).

====Definition 3====
The '''Hamming distance''' is the number of places where two vectors differ. Let the two vectors be <math>v\,\!</math> and <math>w\,\!</math>. Then the Hamming distance is also equal to wt(<math>v+w\,\!</math>). The Hamming distance between <math>v\,\!</math> and <math>w\,\!</math> will be denoted <math>d_H(v,w)\,\!</math>.

====Definition 4====
We use <math>\{0,1\}^n\,\!</math> to denote the set of all binary vectors of length <math>n\,\!</math>. A '''code''' <math>C\,\!</math> of length <math>n\,\!</math> is any subset of that set. The set of all elements of <math>C\,\!</math> is called the set of '''codewords'''. We also say there are <math>2^n\,\!</math> <math>n\,\!</math>-bit words in the space.

Suppose <math>n\,\!</math> bits are used to encode <math>k\,\!</math> logical bits. We use the notation <math>[n,k] \,\!</math> do denote such a code.

====Definition 5====
The '''minimum distance''' of a code is the smallest Hamming distance between any two non-equal vectors in a code. This can be written
{{Equation|<math>
d_{Hmin}(C) = \underset{v,w\in C,v\neq w}{\mbox{min}}d_H(v,w).
\,\!</math>|F.1}}
For shorthand, we also use <math> d(C)\,\!</math> or <math> d\,\!</math> if <math> C\,\!</math> is understood.

When that code has a distance <math>d\,\!</math>, the notation <math>[n,k,d] \,\!</math> is used.

====Example 1====
It is interesting to note that if we encode redundantly using <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math> as our logical zero and logical one respectively, then we could detect single bit errors but not correct them. For example, if we receive <math> 01\,\!</math>, we know this cannot be one of our encoded states. So an error must have occurred. However, we don't know whether the sender sent <math> 0_L=00 \,\!</math> or <math>1_L=11\,\!</math>. We do know that an error has occurred though, as long as we know only one error has occurred. Such an encoding can be used as an '''error detecting code'''. In this case there are two code words, <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math>, but four words in the space. The minimum distance is 2, which is the distance between the two code words.

====Example 2====
The three-bit redundant encoding was already given in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]]. One takes logical zero and logical one states to be
{{Equation|<math>
0_L = 000 \;\;\; \mbox{ and } \;\;\; 1_L = 111,
\,\!</math>|F.2}}
where the subscript <math>L \,\!</math> is used to denote a "logical" state; that is, one that is encoded. Recall that this code is able to detect and correct one error. In this case there are two code words out of eight possible words, and the minimal distance is 3.

====Definition 6====
The '''rate''' of a code is given by the ration of the number of logical bits to the number of bits, <math>k/n\,\!</math>.

====Definition 7====
A '''linear code''' <math>C_l\,\!</math> is a code that is closed under addition.

===Linear Codes===

Linear codes are particularly useful because they are able to efficiently identify errors and the associated correct codewords. This ability is due to the added structure these codes have.

====Generator Matrix====

For linear codes, any linear combination of codewords is a codeword. One key feature of a linear code is that it can be specified by a <nowiki>''generator matrix,''</nowiki> <math>G\,\!</math><ref>Recall that we are working with binary codes. Thus the entries of the matrix will also be binary numbers, i.e., 0's and 1's.</ref>. For an <math> [n,k]\,\!</math> code, the '''generator matrix''' is an <math> n\times k\,\!</math> matrix with columns that form a basis for the <math>k\,\!</math>-dimensional coding sub-space of the <math>n\,\!</math>-dimensional binary vector space. In other words, the vectors comprising the rows form a basis that will span the code space. (Note that one may also use the transpose of this matrix as the definition for <math>G\,\!</math>.) Any code word <math>w\,\!</math> described by a vector <math>v\,\!</math> can be written in terms of the generator matrix as <math>w = Gv\,\!</math>. Note that <math>G\,\!</math> is independent of the input and output vectors. In addition, <math>G\,\!</math> is not unique. If columns are switched or even added to produce a new vector that replaces a column, then the generator matrix is still valid for the code. This is due to the requirement that the columns be linearly independent, which is still satisfied if these operations are performed.

====Parity Check Matrix====
Once <math>G\,\!</math> is obtained, one can calculate another useful matrix, <math>P.\,\!</math> <math>P\,\!</math> is an <math>(n- k)\times n\,\!</math> matrix which has the property that
{{Equation|<math>
PG = 0.
\,\!</math>|F.3}}
The matrix <math>P\,\!</math> is called the '''parity check matrix''' or '''dual matrix'''. The rank of <math>P\,\!</math> is at most <math>n- k\,\!</math> and has the property that it annihilates any code word. To see this, recall any code word is written as <math>Gv\,\!</math>: <math>PGv =0\,\!</math> since <math>PG =0.\,\!</math> Also, due to the rank of <math>P,\,\!</math> it can be shown that <math>Pw =0\,\!</math> only if <math>w\,\!</math> is a code word. That is to say, <math>Pw=0\,\!</math> if and only if <math>w\,\!</math> is a code word. This means that <math>P\,\!</math> can be used to test whether or not a word is in the code.

Suppose an error occurs on a code word <math>w\,\!</math> to produce <math>w^\prime = w + e\,\!</math>. It follows that
{{Equation|<math>
Pw^\prime = P(w+e) = Pe,
\,\!</math>|F.4}}
since <math>Pw=0\,\!</math>. This result, <math>Pe\,\!</math> is called the '''error syndrome''' and the measurement to identify <math>Pe\,\!</math> is the '''syndrome measurement'''. Therefore, the result depends only on the error and not on the original code word. If the error can be determined from this result, then it can be corrected independent of the code word. However, in order to have <math>Pe\,\!</math> be unique, two different results, <math>Pe_1\,\!</math> and <math>Pe_2\,\!</math>, must not be equal. This is possible if a distance <math>d\,\!</math> code is constructed such that the parity check matrix has <math>d-1=2t\,\!</math> linearly independent columns. This enables the errors to be identified and corrected.

It is important to emphasize that these two matrices define the code as well as the check and necessary recovery operations. The matrix <math>G\,\!</math> is determined by the code. Once this matrix is determined, there is a method for determining the parity check matrix, <math>P\,\!</math> which is a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. It is possible to determine the parity matrix from the generator matrix. The method for doing this can be found in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]] and it goes as follows. One first puts <math>G^T\,\!</math> in the form of an augmented matrix <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math>.

===Errors===

For any classical error correcting code, there are general conditions that must be satisfied in order for the code to be able to detect and correct errors. The two examples above show how the error can be detected; here, the objective is to give some general conditions.

Note that any state containing an error may be written as the sum of the original (logical or encoded) state <math>w \,\!</math> and another vector <math>e \,\!</math>. The error vector <math>e \,\!</math> has ones in the places where errors are present and zeroes everywhere else. To ensure that the error may be corrected, the following condition must be satisfied for two states with errors occurring:
{{Equation|<math>
w_1 + e_1 \neq w_2 + e_2.
\,\!</math>|F.5}}
This condition is called the '''disjointness condition'''. This condition means that an error on one state cannot be confused with an error on another state. If it could, then the state including the error could not be uniquely identified with an encoded state and the state could not be corrected to its original state after the error occurred. More specifically, for a code to correct <math>t\,\!</math> single-bit errors, it must have distance at least <math>2t + 1 \,\!</math> between any two codewords; i.e., it must be true that <math>d(C) \geq 2t + 1 \,\!</math>. An <math>[n,k]\,\!</math> code with minimal distance <math>d \,\!</math> is denoted <math>[n,k,d]\,\!</math>.

====Example 3====
An important example of an error correcting code is called the <math>[7,4,3]</math> Hamming code. This code, as the notation indicates, encodes <math>k=4</math> bits of information into <math>n=7</math> bits. It also does it in such a way that one error can be detected and corrected since it has a distance of <math>3</math>. The generator matrix for this code can be taken to be
{{Equation|<math>
G^T = \left(\begin{array}{ccccccc}
1 & 0 & 0 & 0 & 1 & 1 & 0 \\
0 & 1 & 0 & 0 & 1 & 1 & 1 \\
0 & 0 & 1 & 0 & 1 & 0 & 1 \\
0 & 0 & 0 & 1 & 0 & 1 & 1
\end{array}\right).
\,\!</math>|F.6}}
(See for example [[Bibliography#LoeppWootters|Loepp and Wootters [25]]].) From this the parity check matrix, <math>P\,\!</math> can be calculated (as stated above) by finding a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. Alternatively, one could use the method in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]]. Put <math>G^T\,\!</math> in the form <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math> In either case, one can arrive at the following parity check matrix for this code:
{{Equation|<math>
P = \left(\begin{array}{ccccccc}
1 & 1 & 1 & 0 & 1 & 0 & 0 \\
1 & 1 & 0 & 1 & 0 & 1 & 0 \\
0 & 1 & 1 & 1 & 0 & 0 & 1
\end{array}\right).
\,\!</math>|F.7}}
It is useful to note that the code can also be defined by the parity check matrix. Only the codewords are annihilated by the parity check matrix.

===The Disjointness Condition and Correcting Errors===

The motivation for the disjointness condition, [[#eqF.5|Eq.(F.5)]], is to associate each vector in the space with a particular code word. That is, assuming that only certain errors occur, each error vector should be associated to a particular vector in the code space when the error is added to the original code word. This partitions the set into disjoint subsets, with each containing only one code vector. A message is decoded correctly if the vector (the one containing the error) is in the subset that is associated with the original vector (the one with no error). For example, if one vector is sent, say <math> v_1 \,\!</math>, and an error occurs during transmission to produce <math> v_2 = v_1 +e\,\!</math>, then this vector must be in the subset containing <math> v_1 \,\!</math>.

A way to decode is to record an array of possible code words, possible errors, and the combinations of those errors and code words. The array can be set up as a top row of the code word vectors and a leftmost column of errors, with the element of the first row and the first column being the zero vector and all subsequent entries in the column being errors. Then the element at the top of a column (say the jth column) is added to the error in the corresponding row (say the kth row) to get the j,k entry of the array. With this array one can associate a column with a subset that is disjoint with the other sets. Identifying the erred code word in a column associates it with a code word and thus corrects the error.

====Example 4====

In this example we are going to use <math>G\,\!</math> [[#eqF.6| (F.6)]] and <math>P\,\!</math> [[#eqF.7| (F.7)]] from the example above.

The set of code words is given by all of the linear combinations of the rows of <math>P,\,\!</math> meaning there are <math>2^3\,\!</math> code words. The set of code words,
<center><math>C = \left\{0000000, 1110100, 1101010, 0111001, 0100111, 1010011, 0011110, 1001101\right\}.\,\!</math></center>

<center> <div id="TableF.1"><big>'''TABLE F.1'''</big></div>

{| border="1" cellpadding="10" cellspacing="0"
|+ align="bottom" |Table F.1: ''Array to determine possible errors on an unknown code word in set <math>C\,\!</math>''
|-
|<math>0000000\,\!</math>
|<math>1110100\,\!</math>
|<math>1101010\,\!</math>
|<math>0111001\,\!</math>
|<math>0100111\,\!</math>
|<math>1010011\,\!</math>
|<math>0011110\,\!</math>
|<math>1001101\,\!</math>
|-
|<math>1000000\,\!</math>
|<math>0110100\,\!</math>
|<math>0101010\,\!</math>
|<math>1111001\,\!</math>
|<math>1100111\,\!</math>
|<math>0010011\,\!</math>
|<math>1011110\,\!</math>
|<math>0001101\,\!</math>
|-
|<math>0100000\,\!</math>
|<math>1010100\,\!</math>
|<math>1001010\,\!</math>
|<math>0011001\,\!</math>
|<math>0000111\,\!</math>
|<math>1110011\,\!</math>
|<math>0111110\,\!</math>
|<math>1101101\,\!</math>
|-
|<math>0010000\,\!</math>
|<math>1100100\,\!</math>
|<math>1111010\,\!</math>
|<math>0101001\,\!</math>
|<math>0110111\,\!</math>
|<math>1000011\,\!</math>
|<math>0001110\,\!</math>
|<math>1011101\,\!</math>
|-
|<math>0001000\,\!</math>
|<math>1111100\,\!</math>
|<math>1100010\,\!</math>
|<math>0110001\,\!</math>
|<math>0101111\,\!</math>
|<math>1011011\,\!</math>
|<math>0010110\,\!</math>
|<math>1000101\,\!</math>
|-
|<math>0000100\,\!</math>
|<math>1110000\,\!</math>
|<math>1101110\,\!</math>
|<math>0111101\,\!</math>
|<math>0100011\,\!</math>
|<math>1010111\,\!</math>
|<math>0011010\,\!</math>
|<math>1001001\,\!</math>
|-
|<math>0000010\,\!</math>
|<math>1110110\,\!</math>
|<math>1101000\,\!</math>
|<math>0111011\,\!</math>
|<math>0100101\,\!</math>
|<math>1010001\,\!</math>
|<math>0011100\,\!</math>
|<math>1001111\,\!</math>
|-
|<math>0000001\,\!</math>
|<math>1110101\,\!</math>
|<math>1101011\,\!</math>
|<math>0111000\,\!</math>
|<math>0100110\,\!</math>
|<math>1010010\,\!</math>
|<math>0011111\,\!</math>
|<math>1001100\,\!</math>
|}</center>

Now, suppose you are expecting to receive a code word, <math>c\in C.\,\!</math> But, instead you receive <math>0101111\notin C.\,\!</math> What we are able to do is look at Table F.1 and see that <math>0101111\,\!</math> is in column 5. Since the columns of this table represent the disjoint subsets of our code space, we see that <math>c = 0100111\,\!</math> and the error that occurred was <math>e_{4}\,\!</math> or <math>0001000.\,\!</math>

===The Hamming Bound===

The Hamming bound is a bound that restricts the rate of the code. Due to the disjointness condition, a certain number of bits are required to ensure our ability to detect and correct errors. Suppose there is a set of <math> n\,\!</math> bit vectors for encoding <math> k\,\!</math> bits of information. There is a set of error vectors of weight <math> t \,\!</math> that has <math> C(n,t)\,\!</math> elements<ref>That is, <math> n \,\!</math> choose <math> t \,\!</math> vectors. The notation is <math> C(n,t) = {n\choose t} = \frac{n!}{(n-t)!t!}.\,\!</math></ref>. So the number of error vectors, including errors of weight up to <math> t \,\!</math>, is
<math> \sum_{i=0}^t C(n,i). \,\!</math> (Note that no error is also part of the set of error vectors. The objective is to be able to design a code that can correct all errors up to those of weight <math> t \,\!</math>, and this includes no error at all.) Since there are <math> 2^n\,\!</math> vectors in the whole space of <math> n\,\!</math> bits, and assuming <math> m\,\!</math> vectors are used for the encoding, the Hamming bound is
{{Equation|<math>
m\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.8}}
For linear codes, <math> m=2^k,\,\!</math> so
{{Equation|<math>
2^k\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.9}}
Taking the logarithm,
{{Equation|<math>
k \leq n - \log_2\left(\sum_{i=0}^t C(n,i)\right).
\,\!</math>|F.10}}
For large <math> n, k \,\!</math> and <math> t \,\!</math>, we can use [[#LoPopescueSpiller|Stirling's formula]] to show that
{{Equation|<math>
\frac{k}{n} \leq 1 - H\left(\frac{t}{n}\right),
\,\!</math>|F.11}}
where <math> H(x) = -x\log x -(1-x)\log (1-x) \,\!</math> and we have neglected an overall multiplicative constant that goes to 1 as <math> n\rightarrow \infty. \,\!</math> (Again, see the article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane.)

===More Definitions===

====Definition 8: Dual Code====

Let <math>\mathcal{C}\,\!</math> be a code and let <math>v\,\!</math> be a vector in the code space. The '''dual code''', denoted <math>\mathcal{C}^\perp\,\!</math>, is the set of all vectors that have zero inner product with all <math>v\in \mathcal{C}\,\!</math>. In other words, it is the set of all vectors <math>u\,\!</math> such that <math>u\cdot v = 0\,\!</math> for all <math>v\in \mathcal{C}\,\!</math>.

For binary vectors, a vector can be orthogonal to itself. Note that this is different from ordinary vectors in 3-d space.

The dual code is a useful entity in classical error correction and will be used in the construction of the quantum error correcting codes known as [[Chapter 7 - Quantum Error Correcting Codes#CSS codes|CSS codes]].

===Final Comments===

As can be seen from the Hamming bound, there is a limit to the rate of an error correcting code. This does not indicate whether or not codes that satisfy these bounds exist, but it does tell us that no codes exist that do not satisfy these bounds. Encoding, decoding, error detection and correction are all difficult problems to solve in general. One of the advantages of the linear codes is that they provide a systematic method for identifying errors on a code through the use of the parity check operation. More generally, checking to see whether or not a bit string (vector) is in the code space would require a look-up table. This would be much more time-consuming than using the parity check matrix; matrix multiplication is quite efficient relative to the look-up table.

Many of these ideas and definitions will be utilized in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]] on quantum error correction. Some linear codes, including the Hamming code above, will have quantum analogues---as do many quantum error correcting codes. In quantum computers, as will be discussed, error correction is necessary due to the delicacy of quantum information. Such discussions will be taken up in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]].

==Footnotes==
<references />

Appendix F - Classical Error Correcting Codes

2013-03-03T21:19:34Z

Ddghunter: /* More Definitions */ Numbering Correction

===Introduction===

Classical error correcting codes are in use in a wide variety of digital electronics and other classical information systems. It is a good idea to learn some of the basic definitions, ideas, methods, and simple examples of classical error correcting codes in order to understand the (slightly) more complicated quantum error correcting codes. There are many good introductions to classical error correction. Here we follow a few sources which also discuss quantum error correcting codes: the book by [[Bibliography#LoeppWootters|Loepp and Wootters [25]]], an article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane, [[Bibliography#GottDiss|Gottesman's Thesis [27]]], and [[Bibliography#Gaitan:book|Gaitan's Book [3]]] on quantum error correction, which also discusses classical error correction.

===Binary Operations===

The set <math> \{0,1\} \,\!</math> is a group under addition. (See [[Appendix D - Group Theory#Example 3|Section D.2.8]] of [[Appendix D - Group Theory|Appendix D]].) The way this is achieved is by deciding that we will only use these two numbers in our language and using addition modulo 2, meaning <math> 0+0=0, 1+0 = 0+1 = 1, \,\!</math> and <math>1+1 =0\,\!</math>. If we also include the operation of multiplication and these two operations follow the distributive law, the set becomes a '''field''' (a Galois Field), which is denoted GF<math>(2)\,\!</math>. Since one often works with strings of bits, it is very useful to consider the string of bits to be a vector and to use vector addition (which is component-wise addition) and vector multiplication (which is the inner product). For example, the addition of the vector <math>(0,0,1)\,\!</math> and <math>(0,1,1)\,\!</math> is <math>(0,0,1) + (0,1,1) = (0,1,0)\,\!</math>. The inner product between these two vectors is <math>(0,0,1) \cdot (0,1,1) = 0\cdot 0 + 0\cdot 1 + 1\cdot 1 = 0 +0 +1=1\,\!</math>.

===Definitions and Basics===

====Definition 1====
The inner product is also called a '''checksum''' or '''parity check''' since it shows whether or not the first and second vectors agree, or have an even number of 1's at the positions specified by the ones in the other vector. We may say that the first vector satisfies the parity check of the other vector, or vice versa.

====Definition 2====
The '''weight''' or '''Hamming weight''' is the number of non-zero components of a vector or string. The weight of a vector <math>v\,\!</math> is denoted wt(<math>v\,\!</math>).

====Definition 3====
The '''Hamming distance''' is the number of places where two vectors differ. Let the two vectors be <math>v\,\!</math> and <math>w\,\!</math>. Then the Hamming distance is also equal to wt(<math>v+w\,\!</math>). The Hamming distance between <math>v\,\!</math> and <math>w\,\!</math> will be denoted <math>d_H(v,w)\,\!</math>.

====Definition 4====
We use <math>\{0,1\}^n\,\!</math> to denote the set of all binary vectors of length <math>n\,\!</math>. A '''code''' <math>C\,\!</math> of length <math>n\,\!</math> is any subset of that set. The set of all elements of <math>C\,\!</math> is called the set of '''codewords'''. We also say there are <math>2^n\,\!</math> <math>n\,\!</math>-bit words in the space.

Suppose <math>n\,\!</math> bits are used to encode <math>k\,\!</math> logical bits. We use the notation <math>[n,k] \,\!</math> do denote such a code.

====Definition 5====
The '''minimum distance''' of a code is the smallest Hamming distance between any two non-equal vectors in a code. This can be written
{{Equation|<math>
d_{Hmin}(C) = \underset{v,w\in C,v\neq w}{\mbox{min}}d_H(v,w).
\,\!</math>|F.1}}
For shorthand, we also use <math> d(C)\,\!</math> or <math> d\,\!</math> if <math> C\,\!</math> is understood.

When that code has a distance <math>d\,\!</math>, the notation <math>[n,k,d] \,\!</math> is used.

====Example 1====
It is interesting to note that if we encode redundantly using <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math> as our logical zero and logical one respectively, then we could detect single bit errors but not correct them. For example, if we receive <math> 01\,\!</math>, we know this cannot be one of our encoded states. So an error must have occurred. However, we don't know whether the sender sent <math> 0_L=00 \,\!</math> or <math>1_L=11\,\!</math>. We do know that an error has occurred though, as long as we know only one error has occurred. Such an encoding can be used as an '''error detecting code'''. In this case there are two code words, <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math>, but four words in the space. The minimum distance is 2, which is the distance between the two code words.

====Example 2====
The three-bit redundant encoding was already given in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]]. One takes logical zero and logical one states to be
{{Equation|<math>
0_L = 000 \;\;\; \mbox{ and } \;\;\; 1_L = 111,
\,\!</math>|F.2}}
where the subscript <math>L \,\!</math> is used to denote a "logical" state; that is, one that is encoded. Recall that this code is able to detect and correct one error. In this case there are two code words out of eight possible words, and the minimal distance is 3.

====Definition 6====
The '''rate''' of a code is given by the ration of the number of logical bits to the number of bits, <math>k/n\,\!</math>.

====Definition 7====
A '''linear code''' <math>C_l\,\!</math> is a code that is closed under addition.

===Linear Codes===

Linear codes are particularly useful because they are able to efficiently identify errors and the associated correct codewords. This ability is due to the added structure these codes have. These will be discussed in the following sections.

====Generator Matrix====

For linear codes, any linear combination of codewords is a codeword. One key feature of a linear code is that it can be specified by a <nowiki>''generator matrix,''</nowiki> <math>G\,\!</math><ref>Recall that we are working with binary codes. Thus the entries of the matrix will also be binary numbers, i.e., 0's and 1's.</ref>. For an <math> [n,k]\,\!</math> code, the '''generator matrix''' is an <math> n\times k\,\!</math> matrix with columns that form a basis for the <math>k\,\!</math>-dimensional coding sub-space of the <math>n\,\!</math>-dimensional binary vector space. In other words, the vectors comprising the rows form a basis that will span the code space. (Note that one may also use the transpose of this matrix as the definition for <math>G\,\!</math>.) Any code word <math>w\,\!</math> described by a vector <math>v\,\!</math> can be written in terms of the generator matrix as <math>w = Gv\,\!</math>. Note that <math>G\,\!</math> is independent of the input and output vectors. In addition, <math>G\,\!</math> is not unique. If columns are switched or even added to produce a new vector that replaces a column, then the generator matrix is still valid for the code. This is due to the requirement that the columns be linearly independent, which is still satisfied if these operations are performed.

====Parity Check Matrix====
Once <math>G\,\!</math> is obtained, one can calculate another useful matrix, <math>P.\,\!</math> <math>P\,\!</math> is an <math>(n- k)\times n\,\!</math> matrix which has the property that
{{Equation|<math>
PG = 0.
\,\!</math>|F.3}}
The matrix <math>P\,\!</math> is called the '''parity check matrix''' or '''dual matrix'''. The rank of <math>P\,\!</math> is at most <math>n- k\,\!</math> and has the property that it annihilates any code word. To see this, recall any code word is written as <math>Gv\,\!</math>: <math>PGv =0\,\!</math> since <math>PG =0.\,\!</math> Also, due to the rank of <math>P,\,\!</math> it can be shown that <math>Pw =0\,\!</math> only if <math>w\,\!</math> is a code word. That is to say, <math>Pw=0\,\!</math> if and only if <math>w\,\!</math> is a code word. This means that <math>P\,\!</math> can be used to test whether or not a word is in the code.

Suppose an error occurs on a code word <math>w\,\!</math> to produce <math>w^\prime = w + e\,\!</math>. It follows that
{{Equation|<math>
Pw^\prime = P(w+e) = Pe,
\,\!</math>|F.4}}
since <math>Pw=0\,\!</math>. This result, <math>Pe\,\!</math> is called the '''error syndrome''' and the measurement to identify <math>Pe\,\!</math> is the '''syndrome measurement'''. Therefore, the result depends only on the error and not on the original code word. If the error can be determined from this result, then it can be corrected independent of the code word. However, in order to have <math>Pe\,\!</math> be unique, two different results, <math>Pe_1\,\!</math> and <math>Pe_2\,\!</math>, must not be equal. This is possible if a distance <math>d\,\!</math> code is constructed such that the parity check matrix has <math>d-1=2t\,\!</math> linearly independent columns. This enables the errors to be identified and corrected.

It is important to emphasize that these two matrices define the code as well as the check and necessary recovery operations. The matrix <math>G\,\!</math> is determined by the code. Once this matrix is determined, there is a method for determining the parity check matrix, <math>P\,\!</math> which is a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. It is possible to determine the parity matrix from the generator matrix. The method for doing this can be found in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]] and it goes as follows. One first puts <math>G^T\,\!</math> in the form of an augmented matrix <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math>.

===Errors===

For any classical error correcting code, there are general conditions that must be satisfied in order for the code to be able to detect and correct errors. The two examples above show how the error can be detected; here, the objective is to give some general conditions.

Note that any state containing an error may be written as the sum of the original (logical or encoded) state <math>w \,\!</math> and another vector <math>e \,\!</math>. The error vector <math>e \,\!</math> has ones in the places where errors are present and zeroes everywhere else. To ensure that the error may be corrected, the following condition must be satisfied for two states with errors occurring:
{{Equation|<math>
w_1 + e_1 \neq w_2 + e_2.
\,\!</math>|F.5}}
This condition is called the '''disjointness condition'''. This condition means that an error on one state cannot be confused with an error on another state. If it could, then the state including the error could not be uniquely identified with an encoded state and the state could not be corrected to its original state after the error occurred. More specifically, for a code to correct <math>t\,\!</math> single-bit errors, it must have distance at least <math>2t + 1 \,\!</math> between any two codewords; i.e., it must be true that <math>d(C) \geq 2t + 1 \,\!</math>. An <math>[n,k]\,\!</math> code with minimal distance <math>d \,\!</math> is denoted <math>[n,k,d]\,\!</math>.

====Example 3====
An important example of an error correcting code is called the <math>[7,4,3]</math> Hamming code. This code, as the notation indicates, encodes <math>k=4</math> bits of information into <math>n=7</math> bits. It also does it in such a way that one error can be detected and corrected since it has a distance of <math>3</math>. The generator matrix for this code can be taken to be
{{Equation|<math>
G^T = \left(\begin{array}{ccccccc}
1 & 0 & 0 & 0 & 1 & 1 & 0 \\
0 & 1 & 0 & 0 & 1 & 1 & 1 \\
0 & 0 & 1 & 0 & 1 & 0 & 1 \\
0 & 0 & 0 & 1 & 0 & 1 & 1
\end{array}\right).
\,\!</math>|F.6}}
(See for example [[Bibliography#LoeppWootters|Loepp and Wootters [25]]].) From this the parity check matrix, <math>P\,\!</math> can be calculated (as stated above) by finding a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. Alternatively, one could use the method in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]]. Put <math>G^T\,\!</math> in the form <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math> In either case, one can arrive at the following parity check matrix for this code:
{{Equation|<math>
P = \left(\begin{array}{ccccccc}
1 & 1 & 1 & 0 & 1 & 0 & 0 \\
1 & 1 & 0 & 1 & 0 & 1 & 0 \\
0 & 1 & 1 & 1 & 0 & 0 & 1
\end{array}\right).
\,\!</math>|F.7}}
It is useful to note that the code can also be defined by the parity check matrix. Only the codewords are annihilated by the parity check matrix.

===The Disjointness Condition and Correcting Errors===

The motivation for the disjointness condition, [[#eqF.5|Eq.(F.5)]], is to associate each vector in the space with a particular code word. That is, assuming that only certain errors occur, each error vector should be associated to a particular vector in the code space when the error is added to the original code word. This partitions the set into disjoint subsets, with each containing only one code vector. A message is decoded correctly if the vector (the one containing the error) is in the subset that is associated with the original vector (the one with no error). For example, if one vector is sent, say <math> v_1 \,\!</math>, and an error occurs during transmission to produce <math> v_2 = v_1 +e\,\!</math>, then this vector must be in the subset containing <math> v_1 \,\!</math>.

A way to decode is to record an array of possible code words, possible errors, and the combinations of those errors and code words. The array can be set up as a top row of the code word vectors and a leftmost column of errors, with the element of the first row and the first column being the zero vector and all subsequent entries in the column being errors. Then the element at the top of a column (say the jth column) is added to the error in the corresponding row (say the kth row) to get the j,k entry of the array. With this array one can associate a column with a subset that is disjoint with the other sets. Identifying the erred code word in a column associates it with a code word and thus corrects the error.

====Example 4====

In this example we are going to use <math>G\,\!</math> [[#eqF.6| (F.6)]] and <math>P\,\!</math> [[#eqF.7| (F.7)]] from the example above.

The set of code words is given by all of the linear combinations of the rows of <math>P,\,\!</math> meaning there are <math>2^3\,\!</math> code words. The set of code words,
<center><math>C = \left\{0000000, 1110100, 1101010, 0111001, 0100111, 1010011, 0011110, 1001101\right\}.\,\!</math></center>

<center> <div id="TableF.1"><big>'''TABLE F.1'''</big></div>

{| border="1" cellpadding="10" cellspacing="0"
|+ align="bottom" |Table F.1: ''Array to determine possible errors on an unknown code word in set <math>C\,\!</math>''
|-
|<math>0000000\,\!</math>
|<math>1110100\,\!</math>
|<math>1101010\,\!</math>
|<math>0111001\,\!</math>
|<math>0100111\,\!</math>
|<math>1010011\,\!</math>
|<math>0011110\,\!</math>
|<math>1001101\,\!</math>
|-
|<math>1000000\,\!</math>
|<math>0110100\,\!</math>
|<math>0101010\,\!</math>
|<math>1111001\,\!</math>
|<math>1100111\,\!</math>
|<math>0010011\,\!</math>
|<math>1011110\,\!</math>
|<math>0001101\,\!</math>
|-
|<math>0100000\,\!</math>
|<math>1010100\,\!</math>
|<math>1001010\,\!</math>
|<math>0011001\,\!</math>
|<math>0000111\,\!</math>
|<math>1110011\,\!</math>
|<math>0111110\,\!</math>
|<math>1101101\,\!</math>
|-
|<math>0010000\,\!</math>
|<math>1100100\,\!</math>
|<math>1111010\,\!</math>
|<math>0101001\,\!</math>
|<math>0110111\,\!</math>
|<math>1000011\,\!</math>
|<math>0001110\,\!</math>
|<math>1011101\,\!</math>
|-
|<math>0001000\,\!</math>
|<math>1111100\,\!</math>
|<math>1100010\,\!</math>
|<math>0110001\,\!</math>
|<math>0101111\,\!</math>
|<math>1011011\,\!</math>
|<math>0010110\,\!</math>
|<math>1000101\,\!</math>
|-
|<math>0000100\,\!</math>
|<math>1110000\,\!</math>
|<math>1101110\,\!</math>
|<math>0111101\,\!</math>
|<math>0100011\,\!</math>
|<math>1010111\,\!</math>
|<math>0011010\,\!</math>
|<math>1001001\,\!</math>
|-
|<math>0000010\,\!</math>
|<math>1110110\,\!</math>
|<math>1101000\,\!</math>
|<math>0111011\,\!</math>
|<math>0100101\,\!</math>
|<math>1010001\,\!</math>
|<math>0011100\,\!</math>
|<math>1001111\,\!</math>
|-
|<math>0000001\,\!</math>
|<math>1110101\,\!</math>
|<math>1101011\,\!</math>
|<math>0111000\,\!</math>
|<math>0100110\,\!</math>
|<math>1010010\,\!</math>
|<math>0011111\,\!</math>
|<math>1001100\,\!</math>
|}</center>

Now, suppose you are expecting to receive a code word, <math>c\in C.\,\!</math> But, instead you receive <math>0101111\notin C.\,\!</math> What we are able to do is look at Table F.1 and see that <math>0101111\,\!</math> is in column 5. Since the columns of this table represent the disjoint subsets of our code space, we see that <math>c = 0100111\,\!</math> and the error that occurred was <math>e_{4}\,\!</math> or <math>0001000.\,\!</math>

===The Hamming Bound===

The Hamming bound is a bound that restricts the rate of the code. Due to the disjointness condition, a certain number of bits are required to ensure our ability to detect and correct errors. Suppose there is a set of <math> n\,\!</math> bit vectors for encoding <math> k\,\!</math> bits of information. There is a set of error vectors of weight <math> t \,\!</math> that has <math> C(n,t)\,\!</math> elements<ref>That is, <math> n \,\!</math> choose <math> t \,\!</math> vectors. The notation is <math> C(n,t) = {n\choose t} = \frac{n!}{(n-t)!t!}.\,\!</math></ref>. So the number of error vectors, including errors of weight up to <math> t \,\!</math>, is
<math> \sum_{i=0}^t C(n,i). \,\!</math> (Note that no error is also part of the set of error vectors. The objective is to be able to design a code that can correct all errors up to those of weight <math> t \,\!</math>, and this includes no error at all.) Since there are <math> 2^n\,\!</math> vectors in the whole space of <math> n\,\!</math> bits, and assuming <math> m\,\!</math> vectors are used for the encoding, the Hamming bound is
{{Equation|<math>
m\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.8}}
For linear codes, <math> m=2^k,\,\!</math> so
{{Equation|<math>
2^k\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.9}}
Taking the logarithm,
{{Equation|<math>
k \leq n - \log_2\left(\sum_{i=0}^t C(n,i)\right).
\,\!</math>|F.10}}
For large <math> n, k \,\!</math> and <math> t \,\!</math>, we can use [[#LoPopescueSpiller|Stirling's formula]] to show that
{{Equation|<math>
\frac{k}{n} \leq 1 - H\left(\frac{t}{n}\right),
\,\!</math>|F.11}}
where <math> H(x) = -x\log x -(1-x)\log (1-x) \,\!</math> and we have neglected an overall multiplicative constant that goes to 1 as <math> n\rightarrow \infty. \,\!</math> (Again, see the article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane.)

===More Definitions===

====Definition 8: Dual Code====

Let <math>\mathcal{C}\,\!</math> be a code and let <math>v\,\!</math> be a vector in the code space. The '''dual code''', denoted <math>\mathcal{C}^\perp\,\!</math>, is the set of all vectors that have zero inner product with all <math>v\in \mathcal{C}\,\!</math>. In other words, it is the set of all vectors <math>u\,\!</math> such that <math>u\cdot v = 0\,\!</math> for all <math>v\in \mathcal{C}\,\!</math>.

For binary vectors, a vector can be orthogonal to itself. Note that this is different from ordinary vectors in 3-d space.

The dual code is a useful entity in classical error correction and will be used in the construction of the quantum error correcting codes known as [[Chapter 7 - Quantum Error Correcting Codes#CSS codes|CSS codes]].

===Final Comments===

As can be seen from the Hamming bound, there is a limit to the rate of an error correcting code. This does not indicate whether or not codes that satisfy these bounds exist, but it does tell us that no codes exist that do not satisfy these bounds. Encoding, decoding, error detection and correction are all difficult problems to solve in general. One of the advantages of the linear codes is that they provide a systematic method for identifying errors on a code through the use of the parity check operation. More generally, checking to see whether or not a bit string (vector) is in the code space would require a look-up table. This would be much more time-consuming than using the parity check matrix; matrix multiplication is quite efficient relative to the look-up table.

Many of these ideas and definitions will be utilized in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]] on quantum error correction. Some linear codes, including the Hamming code above, will have quantum analogues---as do many quantum error correcting codes. In quantum computers, as will be discussed, error correction is necessary due to the delicacy of quantum information. Such discussions will be taken up in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]].

==Footnotes==
<references />

Appendix F - Classical Error Correcting Codes

2013-03-03T21:18:22Z

Ddghunter: /* The Disjointness Condition and Correcting Errors */ Add Example 4

===Introduction===

Classical error correcting codes are in use in a wide variety of digital electronics and other classical information systems. It is a good idea to learn some of the basic definitions, ideas, methods, and simple examples of classical error correcting codes in order to understand the (slightly) more complicated quantum error correcting codes. There are many good introductions to classical error correction. Here we follow a few sources which also discuss quantum error correcting codes: the book by [[Bibliography#LoeppWootters|Loepp and Wootters [25]]], an article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane, [[Bibliography#GottDiss|Gottesman's Thesis [27]]], and [[Bibliography#Gaitan:book|Gaitan's Book [3]]] on quantum error correction, which also discusses classical error correction.

===Binary Operations===

The set <math> \{0,1\} \,\!</math> is a group under addition. (See [[Appendix D - Group Theory#Example 3|Section D.2.8]] of [[Appendix D - Group Theory|Appendix D]].) The way this is achieved is by deciding that we will only use these two numbers in our language and using addition modulo 2, meaning <math> 0+0=0, 1+0 = 0+1 = 1, \,\!</math> and <math>1+1 =0\,\!</math>. If we also include the operation of multiplication and these two operations follow the distributive law, the set becomes a '''field''' (a Galois Field), which is denoted GF<math>(2)\,\!</math>. Since one often works with strings of bits, it is very useful to consider the string of bits to be a vector and to use vector addition (which is component-wise addition) and vector multiplication (which is the inner product). For example, the addition of the vector <math>(0,0,1)\,\!</math> and <math>(0,1,1)\,\!</math> is <math>(0,0,1) + (0,1,1) = (0,1,0)\,\!</math>. The inner product between these two vectors is <math>(0,0,1) \cdot (0,1,1) = 0\cdot 0 + 0\cdot 1 + 1\cdot 1 = 0 +0 +1=1\,\!</math>.

===Definitions and Basics===

====Definition 1====
The inner product is also called a '''checksum''' or '''parity check''' since it shows whether or not the first and second vectors agree, or have an even number of 1's at the positions specified by the ones in the other vector. We may say that the first vector satisfies the parity check of the other vector, or vice versa.

====Definition 2====
The '''weight''' or '''Hamming weight''' is the number of non-zero components of a vector or string. The weight of a vector <math>v\,\!</math> is denoted wt(<math>v\,\!</math>).

====Definition 3====
The '''Hamming distance''' is the number of places where two vectors differ. Let the two vectors be <math>v\,\!</math> and <math>w\,\!</math>. Then the Hamming distance is also equal to wt(<math>v+w\,\!</math>). The Hamming distance between <math>v\,\!</math> and <math>w\,\!</math> will be denoted <math>d_H(v,w)\,\!</math>.

====Definition 4====
We use <math>\{0,1\}^n\,\!</math> to denote the set of all binary vectors of length <math>n\,\!</math>. A '''code''' <math>C\,\!</math> of length <math>n\,\!</math> is any subset of that set. The set of all elements of <math>C\,\!</math> is called the set of '''codewords'''. We also say there are <math>2^n\,\!</math> <math>n\,\!</math>-bit words in the space.

Suppose <math>n\,\!</math> bits are used to encode <math>k\,\!</math> logical bits. We use the notation <math>[n,k] \,\!</math> do denote such a code.

====Definition 5====
The '''minimum distance''' of a code is the smallest Hamming distance between any two non-equal vectors in a code. This can be written
{{Equation|<math>
d_{Hmin}(C) = \underset{v,w\in C,v\neq w}{\mbox{min}}d_H(v,w).
\,\!</math>|F.1}}
For shorthand, we also use <math> d(C)\,\!</math> or <math> d\,\!</math> if <math> C\,\!</math> is understood.

When that code has a distance <math>d\,\!</math>, the notation <math>[n,k,d] \,\!</math> is used.

====Example 1====
It is interesting to note that if we encode redundantly using <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math> as our logical zero and logical one respectively, then we could detect single bit errors but not correct them. For example, if we receive <math> 01\,\!</math>, we know this cannot be one of our encoded states. So an error must have occurred. However, we don't know whether the sender sent <math> 0_L=00 \,\!</math> or <math>1_L=11\,\!</math>. We do know that an error has occurred though, as long as we know only one error has occurred. Such an encoding can be used as an '''error detecting code'''. In this case there are two code words, <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math>, but four words in the space. The minimum distance is 2, which is the distance between the two code words.

====Example 2====
The three-bit redundant encoding was already given in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]]. One takes logical zero and logical one states to be
{{Equation|<math>
0_L = 000 \;\;\; \mbox{ and } \;\;\; 1_L = 111,
\,\!</math>|F.2}}
where the subscript <math>L \,\!</math> is used to denote a "logical" state; that is, one that is encoded. Recall that this code is able to detect and correct one error. In this case there are two code words out of eight possible words, and the minimal distance is 3.

====Definition 6====
The '''rate''' of a code is given by the ration of the number of logical bits to the number of bits, <math>k/n\,\!</math>.

====Definition 7====
A '''linear code''' <math>C_l\,\!</math> is a code that is closed under addition.

===Linear Codes===

Linear codes are particularly useful because they are able to efficiently identify errors and the associated correct codewords. This ability is due to the added structure these codes have. These will be discussed in the following sections.

====Generator Matrix====

For linear codes, any linear combination of codewords is a codeword. One key feature of a linear code is that it can be specified by a <nowiki>''generator matrix,''</nowiki> <math>G\,\!</math><ref>Recall that we are working with binary codes. Thus the entries of the matrix will also be binary numbers, i.e., 0's and 1's.</ref>. For an <math> [n,k]\,\!</math> code, the '''generator matrix''' is an <math> n\times k\,\!</math> matrix with columns that form a basis for the <math>k\,\!</math>-dimensional coding sub-space of the <math>n\,\!</math>-dimensional binary vector space. In other words, the vectors comprising the rows form a basis that will span the code space. (Note that one may also use the transpose of this matrix as the definition for <math>G\,\!</math>.) Any code word <math>w\,\!</math> described by a vector <math>v\,\!</math> can be written in terms of the generator matrix as <math>w = Gv\,\!</math>. Note that <math>G\,\!</math> is independent of the input and output vectors. In addition, <math>G\,\!</math> is not unique. If columns are switched or even added to produce a new vector that replaces a column, then the generator matrix is still valid for the code. This is due to the requirement that the columns be linearly independent, which is still satisfied if these operations are performed.

====Parity Check Matrix====
Once <math>G\,\!</math> is obtained, one can calculate another useful matrix, <math>P.\,\!</math> <math>P\,\!</math> is an <math>(n- k)\times n\,\!</math> matrix which has the property that
{{Equation|<math>
PG = 0.
\,\!</math>|F.3}}
The matrix <math>P\,\!</math> is called the '''parity check matrix''' or '''dual matrix'''. The rank of <math>P\,\!</math> is at most <math>n- k\,\!</math> and has the property that it annihilates any code word. To see this, recall any code word is written as <math>Gv\,\!</math>: <math>PGv =0\,\!</math> since <math>PG =0.\,\!</math> Also, due to the rank of <math>P,\,\!</math> it can be shown that <math>Pw =0\,\!</math> only if <math>w\,\!</math> is a code word. That is to say, <math>Pw=0\,\!</math> if and only if <math>w\,\!</math> is a code word. This means that <math>P\,\!</math> can be used to test whether or not a word is in the code.

Suppose an error occurs on a code word <math>w\,\!</math> to produce <math>w^\prime = w + e\,\!</math>. It follows that
{{Equation|<math>
Pw^\prime = P(w+e) = Pe,
\,\!</math>|F.4}}
since <math>Pw=0\,\!</math>. This result, <math>Pe\,\!</math> is called the '''error syndrome''' and the measurement to identify <math>Pe\,\!</math> is the '''syndrome measurement'''. Therefore, the result depends only on the error and not on the original code word. If the error can be determined from this result, then it can be corrected independent of the code word. However, in order to have <math>Pe\,\!</math> be unique, two different results, <math>Pe_1\,\!</math> and <math>Pe_2\,\!</math>, must not be equal. This is possible if a distance <math>d\,\!</math> code is constructed such that the parity check matrix has <math>d-1=2t\,\!</math> linearly independent columns. This enables the errors to be identified and corrected.

It is important to emphasize that these two matrices define the code as well as the check and necessary recovery operations. The matrix <math>G\,\!</math> is determined by the code. Once this matrix is determined, there is a method for determining the parity check matrix, <math>P\,\!</math> which is a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. It is possible to determine the parity matrix from the generator matrix. The method for doing this can be found in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]] and it goes as follows. One first puts <math>G^T\,\!</math> in the form of an augmented matrix <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math>.

===Errors===

For any classical error correcting code, there are general conditions that must be satisfied in order for the code to be able to detect and correct errors. The two examples above show how the error can be detected; here, the objective is to give some general conditions.

Note that any state containing an error may be written as the sum of the original (logical or encoded) state <math>w \,\!</math> and another vector <math>e \,\!</math>. The error vector <math>e \,\!</math> has ones in the places where errors are present and zeroes everywhere else. To ensure that the error may be corrected, the following condition must be satisfied for two states with errors occurring:
{{Equation|<math>
w_1 + e_1 \neq w_2 + e_2.
\,\!</math>|F.5}}
This condition is called the '''disjointness condition'''. This condition means that an error on one state cannot be confused with an error on another state. If it could, then the state including the error could not be uniquely identified with an encoded state and the state could not be corrected to its original state after the error occurred. More specifically, for a code to correct <math>t\,\!</math> single-bit errors, it must have distance at least <math>2t + 1 \,\!</math> between any two codewords; i.e., it must be true that <math>d(C) \geq 2t + 1 \,\!</math>. An <math>[n,k]\,\!</math> code with minimal distance <math>d \,\!</math> is denoted <math>[n,k,d]\,\!</math>.

====Example 3====
An important example of an error correcting code is called the <math>[7,4,3]</math> Hamming code. This code, as the notation indicates, encodes <math>k=4</math> bits of information into <math>n=7</math> bits. It also does it in such a way that one error can be detected and corrected since it has a distance of <math>3</math>. The generator matrix for this code can be taken to be
{{Equation|<math>
G^T = \left(\begin{array}{ccccccc}
1 & 0 & 0 & 0 & 1 & 1 & 0 \\
0 & 1 & 0 & 0 & 1 & 1 & 1 \\
0 & 0 & 1 & 0 & 1 & 0 & 1 \\
0 & 0 & 0 & 1 & 0 & 1 & 1
\end{array}\right).
\,\!</math>|F.6}}
(See for example [[Bibliography#LoeppWootters|Loepp and Wootters [25]]].) From this the parity check matrix, <math>P\,\!</math> can be calculated (as stated above) by finding a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. Alternatively, one could use the method in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]]. Put <math>G^T\,\!</math> in the form <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math> In either case, one can arrive at the following parity check matrix for this code:
{{Equation|<math>
P = \left(\begin{array}{ccccccc}
1 & 1 & 1 & 0 & 1 & 0 & 0 \\
1 & 1 & 0 & 1 & 0 & 1 & 0 \\
0 & 1 & 1 & 1 & 0 & 0 & 1
\end{array}\right).
\,\!</math>|F.7}}
It is useful to note that the code can also be defined by the parity check matrix. Only the codewords are annihilated by the parity check matrix.

===The Disjointness Condition and Correcting Errors===

The motivation for the disjointness condition, [[#eqF.5|Eq.(F.5)]], is to associate each vector in the space with a particular code word. That is, assuming that only certain errors occur, each error vector should be associated to a particular vector in the code space when the error is added to the original code word. This partitions the set into disjoint subsets, with each containing only one code vector. A message is decoded correctly if the vector (the one containing the error) is in the subset that is associated with the original vector (the one with no error). For example, if one vector is sent, say <math> v_1 \,\!</math>, and an error occurs during transmission to produce <math> v_2 = v_1 +e\,\!</math>, then this vector must be in the subset containing <math> v_1 \,\!</math>.

A way to decode is to record an array of possible code words, possible errors, and the combinations of those errors and code words. The array can be set up as a top row of the code word vectors and a leftmost column of errors, with the element of the first row and the first column being the zero vector and all subsequent entries in the column being errors. Then the element at the top of a column (say the jth column) is added to the error in the corresponding row (say the kth row) to get the j,k entry of the array. With this array one can associate a column with a subset that is disjoint with the other sets. Identifying the erred code word in a column associates it with a code word and thus corrects the error.

====Example 4====

In this example we are going to use <math>G\,\!</math> [[#eqF.6| (F.6)]] and <math>P\,\!</math> [[#eqF.7| (F.7)]] from the example above.

The set of code words is given by all of the linear combinations of the rows of <math>P,\,\!</math> meaning there are <math>2^3\,\!</math> code words. The set of code words,
<center><math>C = \left\{0000000, 1110100, 1101010, 0111001, 0100111, 1010011, 0011110, 1001101\right\}.\,\!</math></center>

<center> <div id="TableF.1"><big>'''TABLE F.1'''</big></div>

{| border="1" cellpadding="10" cellspacing="0"
|+ align="bottom" |Table F.1: ''Array to determine possible errors on an unknown code word in set <math>C\,\!</math>''
|-
|<math>0000000\,\!</math>
|<math>1110100\,\!</math>
|<math>1101010\,\!</math>
|<math>0111001\,\!</math>
|<math>0100111\,\!</math>
|<math>1010011\,\!</math>
|<math>0011110\,\!</math>
|<math>1001101\,\!</math>
|-
|<math>1000000\,\!</math>
|<math>0110100\,\!</math>
|<math>0101010\,\!</math>
|<math>1111001\,\!</math>
|<math>1100111\,\!</math>
|<math>0010011\,\!</math>
|<math>1011110\,\!</math>
|<math>0001101\,\!</math>
|-
|<math>0100000\,\!</math>
|<math>1010100\,\!</math>
|<math>1001010\,\!</math>
|<math>0011001\,\!</math>
|<math>0000111\,\!</math>
|<math>1110011\,\!</math>
|<math>0111110\,\!</math>
|<math>1101101\,\!</math>
|-
|<math>0010000\,\!</math>
|<math>1100100\,\!</math>
|<math>1111010\,\!</math>
|<math>0101001\,\!</math>
|<math>0110111\,\!</math>
|<math>1000011\,\!</math>
|<math>0001110\,\!</math>
|<math>1011101\,\!</math>
|-
|<math>0001000\,\!</math>
|<math>1111100\,\!</math>
|<math>1100010\,\!</math>
|<math>0110001\,\!</math>
|<math>0101111\,\!</math>
|<math>1011011\,\!</math>
|<math>0010110\,\!</math>
|<math>1000101\,\!</math>
|-
|<math>0000100\,\!</math>
|<math>1110000\,\!</math>
|<math>1101110\,\!</math>
|<math>0111101\,\!</math>
|<math>0100011\,\!</math>
|<math>1010111\,\!</math>
|<math>0011010\,\!</math>
|<math>1001001\,\!</math>
|-
|<math>0000010\,\!</math>
|<math>1110110\,\!</math>
|<math>1101000\,\!</math>
|<math>0111011\,\!</math>
|<math>0100101\,\!</math>
|<math>1010001\,\!</math>
|<math>0011100\,\!</math>
|<math>1001111\,\!</math>
|-
|<math>0000001\,\!</math>
|<math>1110101\,\!</math>
|<math>1101011\,\!</math>
|<math>0111000\,\!</math>
|<math>0100110\,\!</math>
|<math>1010010\,\!</math>
|<math>0011111\,\!</math>
|<math>1001100\,\!</math>
|}</center>

Now, suppose you are expecting to receive a code word, <math>c\in C.\,\!</math> But, instead you receive <math>0101111\notin C.\,\!</math> What we are able to do is look at Table F.1 and see that <math>0101111\,\!</math> is in column 5. Since the columns of this table represent the disjoint subsets of our code space, we see that <math>c = 0100111\,\!</math> and the error that occurred was <math>e_{4}\,\!</math> or <math>0001000.\,\!</math>

===The Hamming Bound===

The Hamming bound is a bound that restricts the rate of the code. Due to the disjointness condition, a certain number of bits are required to ensure our ability to detect and correct errors. Suppose there is a set of <math> n\,\!</math> bit vectors for encoding <math> k\,\!</math> bits of information. There is a set of error vectors of weight <math> t \,\!</math> that has <math> C(n,t)\,\!</math> elements<ref>That is, <math> n \,\!</math> choose <math> t \,\!</math> vectors. The notation is <math> C(n,t) = {n\choose t} = \frac{n!}{(n-t)!t!}.\,\!</math></ref>. So the number of error vectors, including errors of weight up to <math> t \,\!</math>, is
<math> \sum_{i=0}^t C(n,i). \,\!</math> (Note that no error is also part of the set of error vectors. The objective is to be able to design a code that can correct all errors up to those of weight <math> t \,\!</math>, and this includes no error at all.) Since there are <math> 2^n\,\!</math> vectors in the whole space of <math> n\,\!</math> bits, and assuming <math> m\,\!</math> vectors are used for the encoding, the Hamming bound is
{{Equation|<math>
m\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.8}}
For linear codes, <math> m=2^k,\,\!</math> so
{{Equation|<math>
2^k\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.9}}
Taking the logarithm,
{{Equation|<math>
k \leq n - \log_2\left(\sum_{i=0}^t C(n,i)\right).
\,\!</math>|F.10}}
For large <math> n, k \,\!</math> and <math> t \,\!</math>, we can use [[#LoPopescueSpiller|Stirling's formula]] to show that
{{Equation|<math>
\frac{k}{n} \leq 1 - H\left(\frac{t}{n}\right),
\,\!</math>|F.11}}
where <math> H(x) = -x\log x -(1-x)\log (1-x) \,\!</math> and we have neglected an overall multiplicative constant that goes to 1 as <math> n\rightarrow \infty. \,\!</math> (Again, see the article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane.)

===More Definitions===

====Definition 11: Dual Code====

Let <math>\mathcal{C}\,\!</math> be a code and let <math>v\,\!</math> be a vector in the code space. The '''dual code''', denoted <math>\mathcal{C}^\perp\,\!</math>, is the set of all vectors that have zero inner product with all <math>v\in \mathcal{C}\,\!</math>. In other words, it is the set of all vectors <math>u\,\!</math> such that <math>u\cdot v = 0\,\!</math> for all <math>v\in \mathcal{C}\,\!</math>.

For binary vectors, a vector can be orthogonal to itself. Note that this is different from ordinary vectors in 3-d space.

The dual code is a useful entity in classical error correction and will be used in the construction of the quantum error correcting codes known as [[Chapter 7 - Quantum Error Correcting Codes#CSS codes|CSS codes]].

===Final Comments===

As can be seen from the Hamming bound, there is a limit to the rate of an error correcting code. This does not indicate whether or not codes that satisfy these bounds exist, but it does tell us that no codes exist that do not satisfy these bounds. Encoding, decoding, error detection and correction are all difficult problems to solve in general. One of the advantages of the linear codes is that they provide a systematic method for identifying errors on a code through the use of the parity check operation. More generally, checking to see whether or not a bit string (vector) is in the code space would require a look-up table. This would be much more time-consuming than using the parity check matrix; matrix multiplication is quite efficient relative to the look-up table.

Many of these ideas and definitions will be utilized in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]] on quantum error correction. Some linear codes, including the Hamming code above, will have quantum analogues---as do many quantum error correcting codes. In quantum computers, as will be discussed, error correction is necessary due to the delicacy of quantum information. Such discussions will be taken up in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]].

==Footnotes==
<references />

Appendix F - Classical Error Correcting Codes

2013-03-03T18:57:18Z

Ddghunter: /* Errors */ punctutation

===Introduction===

Classical error correcting codes are in use in a wide variety of digital electronics and other classical information systems. It is a good idea to learn some of the basic definitions, ideas, methods, and simple examples of classical error correcting codes in order to understand the (slightly) more complicated quantum error correcting codes. There are many good introductions to classical error correction. Here we follow a few sources which also discuss quantum error correcting codes: the book by [[Bibliography#LoeppWootters|Loepp and Wootters [25]]], an article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane, [[Bibliography#GottDiss|Gottesman's Thesis [27]]], and [[Bibliography#Gaitan:book|Gaitan's Book [3]]] on quantum error correction, which also discusses classical error correction.

===Binary Operations===

The set <math> \{0,1\} \,\!</math> is a group under addition. (See [[Appendix D - Group Theory#Example 3|Section D.2.8]] of [[Appendix D - Group Theory|Appendix D]].) The way this is achieved is by deciding that we will only use these two numbers in our language and using addition modulo 2, meaning <math> 0+0=0, 1+0 = 0+1 = 1, \,\!</math> and <math>1+1 =0\,\!</math>. If we also include the operation of multiplication and these two operations follow the distributive law, the set becomes a '''field''' (a Galois Field), which is denoted GF<math>(2)\,\!</math>. Since one often works with strings of bits, it is very useful to consider the string of bits to be a vector and to use vector addition (which is component-wise addition) and vector multiplication (which is the inner product). For example, the addition of the vector <math>(0,0,1)\,\!</math> and <math>(0,1,1)\,\!</math> is <math>(0,0,1) + (0,1,1) = (0,1,0)\,\!</math>. The inner product between these two vectors is <math>(0,0,1) \cdot (0,1,1) = 0\cdot 0 + 0\cdot 1 + 1\cdot 1 = 0 +0 +1=1\,\!</math>.

===Definitions and Basics===

====Definition 1====
The inner product is also called a '''checksum''' or '''parity check''' since it shows whether or not the first and second vectors agree, or have an even number of 1's at the positions specified by the ones in the other vector. We may say that the first vector satisfies the parity check of the other vector, or vice versa.

====Definition 2====
The '''weight''' or '''Hamming weight''' is the number of non-zero components of a vector or string. The weight of a vector <math>v\,\!</math> is denoted wt(<math>v\,\!</math>).

====Definition 3====
The '''Hamming distance''' is the number of places where two vectors differ. Let the two vectors be <math>v\,\!</math> and <math>w\,\!</math>. Then the Hamming distance is also equal to wt(<math>v+w\,\!</math>). The Hamming distance between <math>v\,\!</math> and <math>w\,\!</math> will be denoted <math>d_H(v,w)\,\!</math>.

====Definition 4====
We use <math>\{0,1\}^n\,\!</math> to denote the set of all binary vectors of length <math>n\,\!</math>. A '''code''' <math>C\,\!</math> of length <math>n\,\!</math> is any subset of that set. The set of all elements of <math>C\,\!</math> is called the set of '''codewords'''. We also say there are <math>2^n\,\!</math> <math>n\,\!</math>-bit words in the space.

Suppose <math>n\,\!</math> bits are used to encode <math>k\,\!</math> logical bits. We use the notation <math>[n,k] \,\!</math> do denote such a code.

====Definition 5====
The '''minimum distance''' of a code is the smallest Hamming distance between any two non-equal vectors in a code. This can be written
{{Equation|<math>
d_{Hmin}(C) = \underset{v,w\in C,v\neq w}{\mbox{min}}d_H(v,w).
\,\!</math>|F.1}}
For shorthand, we also use <math> d(C)\,\!</math> or <math> d\,\!</math> if <math> C\,\!</math> is understood.

When that code has a distance <math>d\,\!</math>, the notation <math>[n,k,d] \,\!</math> is used.

====Example 1====
It is interesting to note that if we encode redundantly using <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math> as our logical zero and logical one respectively, then we could detect single bit errors but not correct them. For example, if we receive <math> 01\,\!</math>, we know this cannot be one of our encoded states. So an error must have occurred. However, we don't know whether the sender sent <math> 0_L=00 \,\!</math> or <math>1_L=11\,\!</math>. We do know that an error has occurred though, as long as we know only one error has occurred. Such an encoding can be used as an '''error detecting code'''. In this case there are two code words, <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math>, but four words in the space. The minimum distance is 2, which is the distance between the two code words.

====Example 2====
The three-bit redundant encoding was already given in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]]. One takes logical zero and logical one states to be
{{Equation|<math>
0_L = 000 \;\;\; \mbox{ and } \;\;\; 1_L = 111,
\,\!</math>|F.2}}
where the subscript <math>L \,\!</math> is used to denote a "logical" state; that is, one that is encoded. Recall that this code is able to detect and correct one error. In this case there are two code words out of eight possible words, and the minimal distance is 3.

====Definition 6====
The '''rate''' of a code is given by the ration of the number of logical bits to the number of bits, <math>k/n\,\!</math>.

====Definition 7====
A '''linear code''' <math>C_l\,\!</math> is a code that is closed under addition.

===Linear Codes===

Linear codes are particularly useful because they are able to efficiently identify errors and the associated correct codewords. This ability is due to the added structure these codes have. These will be discussed in the following sections.

====Generator Matrix====

For linear codes, any linear combination of codewords is a codeword. One key feature of a linear code is that it can be specified by a <nowiki>''generator matrix,''</nowiki> <math>G\,\!</math><ref>Recall that we are working with binary codes. Thus the entries of the matrix will also be binary numbers, i.e., 0's and 1's.</ref>. For an <math> [n,k]\,\!</math> code, the '''generator matrix''' is an <math> n\times k\,\!</math> matrix with columns that form a basis for the <math>k\,\!</math>-dimensional coding sub-space of the <math>n\,\!</math>-dimensional binary vector space. In other words, the vectors comprising the rows form a basis that will span the code space. (Note that one may also use the transpose of this matrix as the definition for <math>G\,\!</math>.) Any code word <math>w\,\!</math> described by a vector <math>v\,\!</math> can be written in terms of the generator matrix as <math>w = Gv\,\!</math>. Note that <math>G\,\!</math> is independent of the input and output vectors. In addition, <math>G\,\!</math> is not unique. If columns are switched or even added to produce a new vector that replaces a column, then the generator matrix is still valid for the code. This is due to the requirement that the columns be linearly independent, which is still satisfied if these operations are performed.

====Parity Check Matrix====
Once <math>G\,\!</math> is obtained, one can calculate another useful matrix, <math>P.\,\!</math> <math>P\,\!</math> is an <math>(n- k)\times n\,\!</math> matrix which has the property that
{{Equation|<math>
PG = 0.
\,\!</math>|F.3}}
The matrix <math>P\,\!</math> is called the '''parity check matrix''' or '''dual matrix'''. The rank of <math>P\,\!</math> is at most <math>n- k\,\!</math> and has the property that it annihilates any code word. To see this, recall any code word is written as <math>Gv\,\!</math>: <math>PGv =0\,\!</math> since <math>PG =0.\,\!</math> Also, due to the rank of <math>P,\,\!</math> it can be shown that <math>Pw =0\,\!</math> only if <math>w\,\!</math> is a code word. That is to say, <math>Pw=0\,\!</math> if and only if <math>w\,\!</math> is a code word. This means that <math>P\,\!</math> can be used to test whether or not a word is in the code.

Suppose an error occurs on a code word <math>w\,\!</math> to produce <math>w^\prime = w + e\,\!</math>. It follows that
{{Equation|<math>
Pw^\prime = P(w+e) = Pe,
\,\!</math>|F.4}}
since <math>Pw=0\,\!</math>. This result, <math>Pe\,\!</math> is called the '''error syndrome''' and the measurement to identify <math>Pe\,\!</math> is the '''syndrome measurement'''. Therefore, the result depends only on the error and not on the original code word. If the error can be determined from this result, then it can be corrected independent of the code word. However, in order to have <math>Pe\,\!</math> be unique, two different results, <math>Pe_1\,\!</math> and <math>Pe_2\,\!</math>, must not be equal. This is possible if a distance <math>d\,\!</math> code is constructed such that the parity check matrix has <math>d-1=2t\,\!</math> linearly independent columns. This enables the errors to be identified and corrected.

It is important to emphasize that these two matrices define the code as well as the check and necessary recovery operations. The matrix <math>G\,\!</math> is determined by the code. Once this matrix is determined, there is a method for determining the parity check matrix, <math>P\,\!</math> which is a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. It is possible to determine the parity matrix from the generator matrix. The method for doing this can be found in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]] and it goes as follows. One first puts <math>G^T\,\!</math> in the form of an augmented matrix <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math>.

===Errors===

For any classical error correcting code, there are general conditions that must be satisfied in order for the code to be able to detect and correct errors. The two examples above show how the error can be detected; here, the objective is to give some general conditions.

Note that any state containing an error may be written as the sum of the original (logical or encoded) state <math>w \,\!</math> and another vector <math>e \,\!</math>. The error vector <math>e \,\!</math> has ones in the places where errors are present and zeroes everywhere else. To ensure that the error may be corrected, the following condition must be satisfied for two states with errors occurring:
{{Equation|<math>
w_1 + e_1 \neq w_2 + e_2.
\,\!</math>|F.5}}
This condition is called the '''disjointness condition'''. This condition means that an error on one state cannot be confused with an error on another state. If it could, then the state including the error could not be uniquely identified with an encoded state and the state could not be corrected to its original state after the error occurred. More specifically, for a code to correct <math>t\,\!</math> single-bit errors, it must have distance at least <math>2t + 1 \,\!</math> between any two codewords; i.e., it must be true that <math>d(C) \geq 2t + 1 \,\!</math>. An <math>[n,k]\,\!</math> code with minimal distance <math>d \,\!</math> is denoted <math>[n,k,d]\,\!</math>.

====Example 3====
An important example of an error correcting code is called the <math>[7,4,3]</math> Hamming code. This code, as the notation indicates, encodes <math>k=4</math> bits of information into <math>n=7</math> bits. It also does it in such a way that one error can be detected and corrected since it has a distance of <math>3</math>. The generator matrix for this code can be taken to be
{{Equation|<math>
G^T = \left(\begin{array}{ccccccc}
1 & 0 & 0 & 0 & 1 & 1 & 0 \\
0 & 1 & 0 & 0 & 1 & 1 & 1 \\
0 & 0 & 1 & 0 & 1 & 0 & 1 \\
0 & 0 & 0 & 1 & 0 & 1 & 1
\end{array}\right).
\,\!</math>|F.6}}
(See for example [[Bibliography#LoeppWootters|Loepp and Wootters [25]]].) From this the parity check matrix, <math>P\,\!</math> can be calculated (as stated above) by finding a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. Alternatively, one could use the method in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]]. Put <math>G^T\,\!</math> in the form <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math> In either case, one can arrive at the following parity check matrix for this code:
{{Equation|<math>
P = \left(\begin{array}{ccccccc}
1 & 1 & 1 & 0 & 1 & 0 & 0 \\
1 & 1 & 0 & 1 & 0 & 1 & 0 \\
0 & 1 & 1 & 1 & 0 & 0 & 1
\end{array}\right).
\,\!</math>|F.7}}
It is useful to note that the code can also be defined by the parity check matrix. Only the codewords are annihilated by the parity check matrix.

===The Disjointness Condition and Correcting Errors===

The motivation for the disjointness condition, [[#eqF.5|Eq.(F.5)]], is to associate each vector in the space with a particular code word. That is, assuming that only certain errors occur, each error vector should be associated to a particular vector in the code space when the error is added to the original code word. This partitions the set into disjoint subsets, with each containing only one code vector. A message is decoded correctly if the vector (the one containing the error) is in the subset that is associated with the original vector (the one with no error). For example, if one vector is sent, say <math> v_1 \,\!</math>, and an error occurs during transmission to produce <math> v_2 = v_1 +e\,\!</math>, then this vector must be in the subset containing <math> v_1 \,\!</math>.

A way to decode is to record an array of possible code words, possible errors, and the combinations of those errors and code words. The array can be set up as a top row of the code word vectors and a leftmost column of errors, with the element of the first row and the first column being the zero vector and all subsequent entries in the column being errors. Then the element at the top of a column (say the jth column) is added to the error in the corresponding row (say the kth row) to get the j,k entry of the array. With this array one can associate a column with a subset that is disjoint with the other sets. Identifying the erred code word in a column associates it with a code word and thus corrects the error.

===The Hamming Bound===

The Hamming bound is a bound that restricts the rate of the code. Due to the disjointness condition, a certain number of bits are required to ensure our ability to detect and correct errors. Suppose there is a set of <math> n\,\!</math> bit vectors for encoding <math> k\,\!</math> bits of information. There is a set of error vectors of weight <math> t \,\!</math> that has <math> C(n,t)\,\!</math> elements<ref>That is, <math> n \,\!</math> choose <math> t \,\!</math> vectors. The notation is <math> C(n,t) = {n\choose t} = \frac{n!}{(n-t)!t!}.\,\!</math></ref>. So the number of error vectors, including errors of weight up to <math> t \,\!</math>, is
<math> \sum_{i=0}^t C(n,i). \,\!</math> (Note that no error is also part of the set of error vectors. The objective is to be able to design a code that can correct all errors up to those of weight <math> t \,\!</math>, and this includes no error at all.) Since there are <math> 2^n\,\!</math> vectors in the whole space of <math> n\,\!</math> bits, and assuming <math> m\,\!</math> vectors are used for the encoding, the Hamming bound is
{{Equation|<math>
m\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.8}}
For linear codes, <math> m=2^k,\,\!</math> so
{{Equation|<math>
2^k\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.9}}
Taking the logarithm,
{{Equation|<math>
k \leq n - \log_2\left(\sum_{i=0}^t C(n,i)\right).
\,\!</math>|F.10}}
For large <math> n, k \,\!</math> and <math> t \,\!</math>, we can use [[#LoPopescueSpiller|Stirling's formula]] to show that
{{Equation|<math>
\frac{k}{n} \leq 1 - H\left(\frac{t}{n}\right),
\,\!</math>|F.11}}
where <math> H(x) = -x\log x -(1-x)\log (1-x) \,\!</math> and we have neglected an overall multiplicative constant that goes to 1 as <math> n\rightarrow \infty. \,\!</math> (Again, see the article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane.)

===More Definitions===

====Definition 11: Dual Code====

Let <math>\mathcal{C}\,\!</math> be a code and let <math>v\,\!</math> be a vector in the code space. The '''dual code''', denoted <math>\mathcal{C}^\perp\,\!</math>, is the set of all vectors that have zero inner product with all <math>v\in \mathcal{C}\,\!</math>. In other words, it is the set of all vectors <math>u\,\!</math> such that <math>u\cdot v = 0\,\!</math> for all <math>v\in \mathcal{C}\,\!</math>.

For binary vectors, a vector can be orthogonal to itself. Note that this is different from ordinary vectors in 3-d space.

The dual code is a useful entity in classical error correction and will be used in the construction of the quantum error correcting codes known as [[Chapter 7 - Quantum Error Correcting Codes#CSS codes|CSS codes]].

===Final Comments===

As can be seen from the Hamming bound, there is a limit to the rate of an error correcting code. This does not indicate whether or not codes that satisfy these bounds exist, but it does tell us that no codes exist that do not satisfy these bounds. Encoding, decoding, error detection and correction are all difficult problems to solve in general. One of the advantages of the linear codes is that they provide a systematic method for identifying errors on a code through the use of the parity check operation. More generally, checking to see whether or not a bit string (vector) is in the code space would require a look-up table. This would be much more time-consuming than using the parity check matrix; matrix multiplication is quite efficient relative to the look-up table.

Many of these ideas and definitions will be utilized in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]] on quantum error correction. Some linear codes, including the Hamming code above, will have quantum analogues---as do many quantum error correcting codes. In quantum computers, as will be discussed, error correction is necessary due to the delicacy of quantum information. Such discussions will be taken up in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]].

==Footnotes==
<references />

Appendix F - Classical Error Correcting Codes

2013-03-03T18:53:16Z

Ddghunter: /* Parity Check Matrix */ punctuation

===Introduction===

Classical error correcting codes are in use in a wide variety of digital electronics and other classical information systems. It is a good idea to learn some of the basic definitions, ideas, methods, and simple examples of classical error correcting codes in order to understand the (slightly) more complicated quantum error correcting codes. There are many good introductions to classical error correction. Here we follow a few sources which also discuss quantum error correcting codes: the book by [[Bibliography#LoeppWootters|Loepp and Wootters [25]]], an article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane, [[Bibliography#GottDiss|Gottesman's Thesis [27]]], and [[Bibliography#Gaitan:book|Gaitan's Book [3]]] on quantum error correction, which also discusses classical error correction.

===Binary Operations===

The set <math> \{0,1\} \,\!</math> is a group under addition. (See [[Appendix D - Group Theory#Example 3|Section D.2.8]] of [[Appendix D - Group Theory|Appendix D]].) The way this is achieved is by deciding that we will only use these two numbers in our language and using addition modulo 2, meaning <math> 0+0=0, 1+0 = 0+1 = 1, \,\!</math> and <math>1+1 =0\,\!</math>. If we also include the operation of multiplication and these two operations follow the distributive law, the set becomes a '''field''' (a Galois Field), which is denoted GF<math>(2)\,\!</math>. Since one often works with strings of bits, it is very useful to consider the string of bits to be a vector and to use vector addition (which is component-wise addition) and vector multiplication (which is the inner product). For example, the addition of the vector <math>(0,0,1)\,\!</math> and <math>(0,1,1)\,\!</math> is <math>(0,0,1) + (0,1,1) = (0,1,0)\,\!</math>. The inner product between these two vectors is <math>(0,0,1) \cdot (0,1,1) = 0\cdot 0 + 0\cdot 1 + 1\cdot 1 = 0 +0 +1=1\,\!</math>.

===Definitions and Basics===

====Definition 1====
The inner product is also called a '''checksum''' or '''parity check''' since it shows whether or not the first and second vectors agree, or have an even number of 1's at the positions specified by the ones in the other vector. We may say that the first vector satisfies the parity check of the other vector, or vice versa.

====Definition 2====
The '''weight''' or '''Hamming weight''' is the number of non-zero components of a vector or string. The weight of a vector <math>v\,\!</math> is denoted wt(<math>v\,\!</math>).

====Definition 3====
The '''Hamming distance''' is the number of places where two vectors differ. Let the two vectors be <math>v\,\!</math> and <math>w\,\!</math>. Then the Hamming distance is also equal to wt(<math>v+w\,\!</math>). The Hamming distance between <math>v\,\!</math> and <math>w\,\!</math> will be denoted <math>d_H(v,w)\,\!</math>.

====Definition 4====
We use <math>\{0,1\}^n\,\!</math> to denote the set of all binary vectors of length <math>n\,\!</math>. A '''code''' <math>C\,\!</math> of length <math>n\,\!</math> is any subset of that set. The set of all elements of <math>C\,\!</math> is called the set of '''codewords'''. We also say there are <math>2^n\,\!</math> <math>n\,\!</math>-bit words in the space.

Suppose <math>n\,\!</math> bits are used to encode <math>k\,\!</math> logical bits. We use the notation <math>[n,k] \,\!</math> do denote such a code.

====Definition 5====
The '''minimum distance''' of a code is the smallest Hamming distance between any two non-equal vectors in a code. This can be written
{{Equation|<math>
d_{Hmin}(C) = \underset{v,w\in C,v\neq w}{\mbox{min}}d_H(v,w).
\,\!</math>|F.1}}
For shorthand, we also use <math> d(C)\,\!</math> or <math> d\,\!</math> if <math> C\,\!</math> is understood.

When that code has a distance <math>d\,\!</math>, the notation <math>[n,k,d] \,\!</math> is used.

====Example 1====
It is interesting to note that if we encode redundantly using <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math> as our logical zero and logical one respectively, then we could detect single bit errors but not correct them. For example, if we receive <math> 01\,\!</math>, we know this cannot be one of our encoded states. So an error must have occurred. However, we don't know whether the sender sent <math> 0_L=00 \,\!</math> or <math>1_L=11\,\!</math>. We do know that an error has occurred though, as long as we know only one error has occurred. Such an encoding can be used as an '''error detecting code'''. In this case there are two code words, <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math>, but four words in the space. The minimum distance is 2, which is the distance between the two code words.

====Example 2====
The three-bit redundant encoding was already given in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]]. One takes logical zero and logical one states to be
{{Equation|<math>
0_L = 000 \;\;\; \mbox{ and } \;\;\; 1_L = 111,
\,\!</math>|F.2}}
where the subscript <math>L \,\!</math> is used to denote a "logical" state; that is, one that is encoded. Recall that this code is able to detect and correct one error. In this case there are two code words out of eight possible words, and the minimal distance is 3.

====Definition 6====
The '''rate''' of a code is given by the ration of the number of logical bits to the number of bits, <math>k/n\,\!</math>.

====Definition 7====
A '''linear code''' <math>C_l\,\!</math> is a code that is closed under addition.

===Linear Codes===

Linear codes are particularly useful because they are able to efficiently identify errors and the associated correct codewords. This ability is due to the added structure these codes have. These will be discussed in the following sections.

====Generator Matrix====

For linear codes, any linear combination of codewords is a codeword. One key feature of a linear code is that it can be specified by a <nowiki>''generator matrix,''</nowiki> <math>G\,\!</math><ref>Recall that we are working with binary codes. Thus the entries of the matrix will also be binary numbers, i.e., 0's and 1's.</ref>. For an <math> [n,k]\,\!</math> code, the '''generator matrix''' is an <math> n\times k\,\!</math> matrix with columns that form a basis for the <math>k\,\!</math>-dimensional coding sub-space of the <math>n\,\!</math>-dimensional binary vector space. In other words, the vectors comprising the rows form a basis that will span the code space. (Note that one may also use the transpose of this matrix as the definition for <math>G\,\!</math>.) Any code word <math>w\,\!</math> described by a vector <math>v\,\!</math> can be written in terms of the generator matrix as <math>w = Gv\,\!</math>. Note that <math>G\,\!</math> is independent of the input and output vectors. In addition, <math>G\,\!</math> is not unique. If columns are switched or even added to produce a new vector that replaces a column, then the generator matrix is still valid for the code. This is due to the requirement that the columns be linearly independent, which is still satisfied if these operations are performed.

====Parity Check Matrix====
Once <math>G\,\!</math> is obtained, one can calculate another useful matrix, <math>P.\,\!</math> <math>P\,\!</math> is an <math>(n- k)\times n\,\!</math> matrix which has the property that
{{Equation|<math>
PG = 0.
\,\!</math>|F.3}}
The matrix <math>P\,\!</math> is called the '''parity check matrix''' or '''dual matrix'''. The rank of <math>P\,\!</math> is at most <math>n- k\,\!</math> and has the property that it annihilates any code word. To see this, recall any code word is written as <math>Gv\,\!</math>: <math>PGv =0\,\!</math> since <math>PG =0.\,\!</math> Also, due to the rank of <math>P,\,\!</math> it can be shown that <math>Pw =0\,\!</math> only if <math>w\,\!</math> is a code word. That is to say, <math>Pw=0\,\!</math> if and only if <math>w\,\!</math> is a code word. This means that <math>P\,\!</math> can be used to test whether or not a word is in the code.

Suppose an error occurs on a code word <math>w\,\!</math> to produce <math>w^\prime = w + e\,\!</math>. It follows that
{{Equation|<math>
Pw^\prime = P(w+e) = Pe,
\,\!</math>|F.4}}
since <math>Pw=0\,\!</math>. This result, <math>Pe\,\!</math> is called the '''error syndrome''' and the measurement to identify <math>Pe\,\!</math> is the '''syndrome measurement'''. Therefore, the result depends only on the error and not on the original code word. If the error can be determined from this result, then it can be corrected independent of the code word. However, in order to have <math>Pe\,\!</math> be unique, two different results, <math>Pe_1\,\!</math> and <math>Pe_2\,\!</math>, must not be equal. This is possible if a distance <math>d\,\!</math> code is constructed such that the parity check matrix has <math>d-1=2t\,\!</math> linearly independent columns. This enables the errors to be identified and corrected.

It is important to emphasize that these two matrices define the code as well as the check and necessary recovery operations. The matrix <math>G\,\!</math> is determined by the code. Once this matrix is determined, there is a method for determining the parity check matrix, <math>P\,\!</math> which is a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. It is possible to determine the parity matrix from the generator matrix. The method for doing this can be found in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]] and it goes as follows. One first puts <math>G^T\,\!</math> in the form of an augmented matrix <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math>.

===Errors===

For any classical error correcting code, there are general conditions that must be satisfied in order for the code to be able to detect and correct errors. The two examples above show how the error can be detected; here, the objective is to give some general conditions.

Note that any state containing an error may be written as the sum of the original (logical or encoded) state <math>w \,\!</math> and another vector <math>e \,\!</math>. The error vector <math>e \,\!</math> has ones in the places where errors are present and zeroes everywhere else. To ensure that the error may be corrected, the following condition must be satisfied for two states with errors occurring:
{{Equation|<math>
w_1 + e_1 \neq w_2 + e_2.
\,\!</math>|F.5}}
This condition is called the '''disjointness condition'''. This condition means that an error on one state cannot be confused with an error on another state. If it could, then the state including the error could not be uniquely identified with an encoded state and the state could not be corrected to its original state before the error occurred. More specifically, for a code to correct <math>t\,\!</math> single-bit errors, it must have distance at least <math>2t + 1 \,\!</math> between any two codewords; i.e., it must be true that <math>d(C) \geq 2t + 1 \,\!</math>. An <math>[n,k]\,\!</math> code with minimal distance <math>d \,\!</math> is denoted <math>[n,k,d]\,\!</math>.

====Example 3====
An important example of an error correcting code is called the <math>[7,4,3]</math> Hamming code. This code, as the notation indicates, encodes <math>k=4</math> bits of information into <math>n=7</math> bits. It also does it in such a way that one error can be detected and corrected since it has a distance of <math>3</math>. The generator matrix for this code can be taken to be
{{Equation|<math>
G^T = \left(\begin{array}{ccccccc}
1 & 0 & 0 & 0 & 1 & 1 & 0 \\
0 & 1 & 0 & 0 & 1 & 1 & 1 \\
0 & 0 & 1 & 0 & 1 & 0 & 1 \\
0 & 0 & 0 & 1 & 0 & 1 & 1
\end{array}\right).
\,\!</math>|F.6}}
(See for example [[Bibliography#LoeppWootters|Loepp and Wootters [25]]].) From this the parity check matrix, <math>P\,\!</math> can be calculated (as stated above) by finding a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. Alternatively, one could use the method in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]]. Put <math>G^T\,\!</math> in the form <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math> In either case, one can arrive at the following parity check matrix for this code:
{{Equation|<math>
P = \left(\begin{array}{ccccccc}
1 & 1 & 1 & 0 & 1 & 0 & 0 \\
1 & 1 & 0 & 1 & 0 & 1 & 0 \\
0 & 1 & 1 & 1 & 0 & 0 & 1
\end{array}\right).
\,\!</math>|F.7}}
It is useful to note that the code can also be defined by the parity check matrix. Only the codewords are annihilated by the parity check matrix.

===The Disjointness Condition and Correcting Errors===

The motivation for the disjointness condition, [[#eqF.5|Eq.(F.5)]], is to associate each vector in the space with a particular code word. That is, assuming that only certain errors occur, each error vector should be associated to a particular vector in the code space when the error is added to the original code word. This partitions the set into disjoint subsets, with each containing only one code vector. A message is decoded correctly if the vector (the one containing the error) is in the subset that is associated with the original vector (the one with no error). For example, if one vector is sent, say <math> v_1 \,\!</math>, and an error occurs during transmission to produce <math> v_2 = v_1 +e\,\!</math>, then this vector must be in the subset containing <math> v_1 \,\!</math>.

A way to decode is to record an array of possible code words, possible errors, and the combinations of those errors and code words. The array can be set up as a top row of the code word vectors and a leftmost column of errors, with the element of the first row and the first column being the zero vector and all subsequent entries in the column being errors. Then the element at the top of a column (say the jth column) is added to the error in the corresponding row (say the kth row) to get the j,k entry of the array. With this array one can associate a column with a subset that is disjoint with the other sets. Identifying the erred code word in a column associates it with a code word and thus corrects the error.

===The Hamming Bound===

The Hamming bound is a bound that restricts the rate of the code. Due to the disjointness condition, a certain number of bits are required to ensure our ability to detect and correct errors. Suppose there is a set of <math> n\,\!</math> bit vectors for encoding <math> k\,\!</math> bits of information. There is a set of error vectors of weight <math> t \,\!</math> that has <math> C(n,t)\,\!</math> elements<ref>That is, <math> n \,\!</math> choose <math> t \,\!</math> vectors. The notation is <math> C(n,t) = {n\choose t} = \frac{n!}{(n-t)!t!}.\,\!</math></ref>. So the number of error vectors, including errors of weight up to <math> t \,\!</math>, is
<math> \sum_{i=0}^t C(n,i). \,\!</math> (Note that no error is also part of the set of error vectors. The objective is to be able to design a code that can correct all errors up to those of weight <math> t \,\!</math>, and this includes no error at all.) Since there are <math> 2^n\,\!</math> vectors in the whole space of <math> n\,\!</math> bits, and assuming <math> m\,\!</math> vectors are used for the encoding, the Hamming bound is
{{Equation|<math>
m\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.8}}
For linear codes, <math> m=2^k,\,\!</math> so
{{Equation|<math>
2^k\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.9}}
Taking the logarithm,
{{Equation|<math>
k \leq n - \log_2\left(\sum_{i=0}^t C(n,i)\right).
\,\!</math>|F.10}}
For large <math> n, k \,\!</math> and <math> t \,\!</math>, we can use [[#LoPopescueSpiller|Stirling's formula]] to show that
{{Equation|<math>
\frac{k}{n} \leq 1 - H\left(\frac{t}{n}\right),
\,\!</math>|F.11}}
where <math> H(x) = -x\log x -(1-x)\log (1-x) \,\!</math> and we have neglected an overall multiplicative constant that goes to 1 as <math> n\rightarrow \infty. \,\!</math> (Again, see the article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane.)

===More Definitions===

====Definition 11: Dual Code====

Let <math>\mathcal{C}\,\!</math> be a code and let <math>v\,\!</math> be a vector in the code space. The '''dual code''', denoted <math>\mathcal{C}^\perp\,\!</math>, is the set of all vectors that have zero inner product with all <math>v\in \mathcal{C}\,\!</math>. In other words, it is the set of all vectors <math>u\,\!</math> such that <math>u\cdot v = 0\,\!</math> for all <math>v\in \mathcal{C}\,\!</math>.

For binary vectors, a vector can be orthogonal to itself. Note that this is different from ordinary vectors in 3-d space.

The dual code is a useful entity in classical error correction and will be used in the construction of the quantum error correcting codes known as [[Chapter 7 - Quantum Error Correcting Codes#CSS codes|CSS codes]].

===Final Comments===

As can be seen from the Hamming bound, there is a limit to the rate of an error correcting code. This does not indicate whether or not codes that satisfy these bounds exist, but it does tell us that no codes exist that do not satisfy these bounds. Encoding, decoding, error detection and correction are all difficult problems to solve in general. One of the advantages of the linear codes is that they provide a systematic method for identifying errors on a code through the use of the parity check operation. More generally, checking to see whether or not a bit string (vector) is in the code space would require a look-up table. This would be much more time-consuming than using the parity check matrix; matrix multiplication is quite efficient relative to the look-up table.

Many of these ideas and definitions will be utilized in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]] on quantum error correction. Some linear codes, including the Hamming code above, will have quantum analogues---as do many quantum error correcting codes. In quantum computers, as will be discussed, error correction is necessary due to the delicacy of quantum information. Such discussions will be taken up in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]].

==Footnotes==
<references />

Appendix F - Classical Error Correcting Codes

2013-03-03T18:45:12Z

Ddghunter: /* Parity Check Matrix */ punctuation

===Introduction===

Classical error correcting codes are in use in a wide variety of digital electronics and other classical information systems. It is a good idea to learn some of the basic definitions, ideas, methods, and simple examples of classical error correcting codes in order to understand the (slightly) more complicated quantum error correcting codes. There are many good introductions to classical error correction. Here we follow a few sources which also discuss quantum error correcting codes: the book by [[Bibliography#LoeppWootters|Loepp and Wootters [25]]], an article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane, [[Bibliography#GottDiss|Gottesman's Thesis [27]]], and [[Bibliography#Gaitan:book|Gaitan's Book [3]]] on quantum error correction, which also discusses classical error correction.

===Binary Operations===

The set <math> \{0,1\} \,\!</math> is a group under addition. (See [[Appendix D - Group Theory#Example 3|Section D.2.8]] of [[Appendix D - Group Theory|Appendix D]].) The way this is achieved is by deciding that we will only use these two numbers in our language and using addition modulo 2, meaning <math> 0+0=0, 1+0 = 0+1 = 1, \,\!</math> and <math>1+1 =0\,\!</math>. If we also include the operation of multiplication and these two operations follow the distributive law, the set becomes a '''field''' (a Galois Field), which is denoted GF<math>(2)\,\!</math>. Since one often works with strings of bits, it is very useful to consider the string of bits to be a vector and to use vector addition (which is component-wise addition) and vector multiplication (which is the inner product). For example, the addition of the vector <math>(0,0,1)\,\!</math> and <math>(0,1,1)\,\!</math> is <math>(0,0,1) + (0,1,1) = (0,1,0)\,\!</math>. The inner product between these two vectors is <math>(0,0,1) \cdot (0,1,1) = 0\cdot 0 + 0\cdot 1 + 1\cdot 1 = 0 +0 +1=1\,\!</math>.

===Definitions and Basics===

====Definition 1====
The inner product is also called a '''checksum''' or '''parity check''' since it shows whether or not the first and second vectors agree, or have an even number of 1's at the positions specified by the ones in the other vector. We may say that the first vector satisfies the parity check of the other vector, or vice versa.

====Definition 2====
The '''weight''' or '''Hamming weight''' is the number of non-zero components of a vector or string. The weight of a vector <math>v\,\!</math> is denoted wt(<math>v\,\!</math>).

====Definition 3====
The '''Hamming distance''' is the number of places where two vectors differ. Let the two vectors be <math>v\,\!</math> and <math>w\,\!</math>. Then the Hamming distance is also equal to wt(<math>v+w\,\!</math>). The Hamming distance between <math>v\,\!</math> and <math>w\,\!</math> will be denoted <math>d_H(v,w)\,\!</math>.

====Definition 4====
We use <math>\{0,1\}^n\,\!</math> to denote the set of all binary vectors of length <math>n\,\!</math>. A '''code''' <math>C\,\!</math> of length <math>n\,\!</math> is any subset of that set. The set of all elements of <math>C\,\!</math> is called the set of '''codewords'''. We also say there are <math>2^n\,\!</math> <math>n\,\!</math>-bit words in the space.

Suppose <math>n\,\!</math> bits are used to encode <math>k\,\!</math> logical bits. We use the notation <math>[n,k] \,\!</math> do denote such a code.

====Definition 5====
The '''minimum distance''' of a code is the smallest Hamming distance between any two non-equal vectors in a code. This can be written
{{Equation|<math>
d_{Hmin}(C) = \underset{v,w\in C,v\neq w}{\mbox{min}}d_H(v,w).
\,\!</math>|F.1}}
For shorthand, we also use <math> d(C)\,\!</math> or <math> d\,\!</math> if <math> C\,\!</math> is understood.

When that code has a distance <math>d\,\!</math>, the notation <math>[n,k,d] \,\!</math> is used.

====Example 1====
It is interesting to note that if we encode redundantly using <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math> as our logical zero and logical one respectively, then we could detect single bit errors but not correct them. For example, if we receive <math> 01\,\!</math>, we know this cannot be one of our encoded states. So an error must have occurred. However, we don't know whether the sender sent <math> 0_L=00 \,\!</math> or <math>1_L=11\,\!</math>. We do know that an error has occurred though, as long as we know only one error has occurred. Such an encoding can be used as an '''error detecting code'''. In this case there are two code words, <math> 0_L=00 \,\!</math> and <math>1_L=11\,\!</math>, but four words in the space. The minimum distance is 2, which is the distance between the two code words.

====Example 2====
The three-bit redundant encoding was already given in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]]. One takes logical zero and logical one states to be
{{Equation|<math>
0_L = 000 \;\;\; \mbox{ and } \;\;\; 1_L = 111,
\,\!</math>|F.2}}
where the subscript <math>L \,\!</math> is used to denote a "logical" state; that is, one that is encoded. Recall that this code is able to detect and correct one error. In this case there are two code words out of eight possible words, and the minimal distance is 3.

====Definition 6====
The '''rate''' of a code is given by the ration of the number of logical bits to the number of bits, <math>k/n\,\!</math>.

====Definition 7====
A '''linear code''' <math>C_l\,\!</math> is a code that is closed under addition.

===Linear Codes===

Linear codes are particularly useful because they are able to efficiently identify errors and the associated correct codewords. This ability is due to the added structure these codes have. These will be discussed in the following sections.

====Generator Matrix====

For linear codes, any linear combination of codewords is a codeword. One key feature of a linear code is that it can be specified by a <nowiki>''generator matrix,''</nowiki> <math>G\,\!</math><ref>Recall that we are working with binary codes. Thus the entries of the matrix will also be binary numbers, i.e., 0's and 1's.</ref>. For an <math> [n,k]\,\!</math> code, the '''generator matrix''' is an <math> n\times k\,\!</math> matrix with columns that form a basis for the <math>k\,\!</math>-dimensional coding sub-space of the <math>n\,\!</math>-dimensional binary vector space. In other words, the vectors comprising the rows form a basis that will span the code space. (Note that one may also use the transpose of this matrix as the definition for <math>G\,\!</math>.) Any code word <math>w\,\!</math> described by a vector <math>v\,\!</math> can be written in terms of the generator matrix as <math>w = Gv\,\!</math>. Note that <math>G\,\!</math> is independent of the input and output vectors. In addition, <math>G\,\!</math> is not unique. If columns are switched or even added to produce a new vector that replaces a column, then the generator matrix is still valid for the code. This is due to the requirement that the columns be linearly independent, which is still satisfied if these operations are performed.

====Parity Check Matrix====
Once <math>G\,\!</math> is obtained, one can calculate another useful matrix, <math>P.\,\!</math> <math>P\,\!</math> is an <math>(n\times k)\times n\,\!</math> matrix which has the property that
{{Equation|<math>
PG = 0.
\,\!</math>|F.3}}
The matrix <math>P\,\!</math> is called the '''parity check matrix''' or '''dual matrix'''. The rank of <math>P\,\!</math> is at most <math>n- k\,\!</math> and has the property that it annihilates any code word. To see this, recall any code word is written as <math>Gv\,\!</math>: <math>PGv =0\,\!</math> since <math>PG =0\,\!</math>. Also, due to the rank of <math>P\,\!</math>, it can be shown that <math>Pw =0\,\!</math> only if <math>w\,\!</math> is a code word. That is to say, <math>Pw=0\,\!</math> if and only if <math>w\,\!</math> is a code word. This means that <math>P\,\!</math> can be used to test whether or not a word is in the code.

Suppose an error occurs on a code word <math>w\,\!</math> to produce <math>w^\prime = w + e\,\!</math>. It follows that
{{Equation|<math>
Pw^\prime = P(w+e) = Pe,
\,\!</math>|F.4}}
since <math>Pw=0\,\!</math>. This result, <math>Pe\,\!</math> is called the '''error syndrome''' and the measurement to identify <math>Pe\,\!</math> is the '''syndrome measurement'''. Therefore, the result depends only on the error and not on the original code word. If the error can be determined from this result, then it can be corrected independent of the code word. However, in order to have <math>Pe\,\!</math> be unique, two different results, <math>Pe_1\,\!</math> and <math>Pe_2\,\!</math>, must not be equal. This is possible if a distance <math>d\,\!</math> code is constructed such that the parity check matrix has <math>d-1=2t\,\!</math> linearly independent columns. This enables the errors to be identified and corrected.

It is important to emphasize that these two matrices define the code as well as the check and necessary recovery operations. The matrix <math>G\,\!</math> is determined by the code. Once this matrix is determined, there is a method for determining the parity check matrix, <math>P\,\!</math> which is a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. It is possible to determine the parity matrix from the generator matrix. The method for doing this can be found in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]] and it goes as follows. One first puts <math>G^T\,\!</math> in the form of an augmented matrix <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math>.

===Errors===

For any classical error correcting code, there are general conditions that must be satisfied in order for the code to be able to detect and correct errors. The two examples above show how the error can be detected; here, the objective is to give some general conditions.

Note that any state containing an error may be written as the sum of the original (logical or encoded) state <math>w \,\!</math> and another vector <math>e \,\!</math>. The error vector <math>e \,\!</math> has ones in the places where errors are present and zeroes everywhere else. To ensure that the error may be corrected, the following condition must be satisfied for two states with errors occurring:
{{Equation|<math>
w_1 + e_1 \neq w_2 + e_2.
\,\!</math>|F.5}}
This condition is called the '''disjointness condition'''. This condition means that an error on one state cannot be confused with an error on another state. If it could, then the state including the error could not be uniquely identified with an encoded state and the state could not be corrected to its original state before the error occurred. More specifically, for a code to correct <math>t\,\!</math> single-bit errors, it must have distance at least <math>2t + 1 \,\!</math> between any two codewords; i.e., it must be true that <math>d(C) \geq 2t + 1 \,\!</math>. An <math>[n,k]\,\!</math> code with minimal distance <math>d \,\!</math> is denoted <math>[n,k,d]\,\!</math>.

====Example 3====
An important example of an error correcting code is called the <math>[7,4,3]</math> Hamming code. This code, as the notation indicates, encodes <math>k=4</math> bits of information into <math>n=7</math> bits. It also does it in such a way that one error can be detected and corrected since it has a distance of <math>3</math>. The generator matrix for this code can be taken to be
{{Equation|<math>
G^T = \left(\begin{array}{ccccccc}
1 & 0 & 0 & 0 & 1 & 1 & 0 \\
0 & 1 & 0 & 0 & 1 & 1 & 1 \\
0 & 0 & 1 & 0 & 1 & 0 & 1 \\
0 & 0 & 0 & 1 & 0 & 1 & 1
\end{array}\right).
\,\!</math>|F.6}}
(See for example [[Bibliography#LoeppWootters|Loepp and Wootters [25]]].) From this the parity check matrix, <math>P\,\!</math> can be calculated (as stated above) by finding a set of <math>n-k\,\!</math> mutually orthogonal vectors that are also orthogonal to the code space defined by the generator matrix. Alternatively, one could use the method in Steane's article in [[Bibliography#LoPopescuSpiller|Lo, Popescu, and Spiller [26]]]. Put <math>G^T\,\!</math> in the form <math>(I_k|A),\,\!</math> where <math>I_k\,\!</math> is the <math>k\times k\,\!</math> identity matrix. Then the parity check matrix is <math>P = (A^T|I_{n-k}).\,\!</math> In either case, one can arrive at the following parity check matrix for this code:
{{Equation|<math>
P = \left(\begin{array}{ccccccc}
1 & 1 & 1 & 0 & 1 & 0 & 0 \\
1 & 1 & 0 & 1 & 0 & 1 & 0 \\
0 & 1 & 1 & 1 & 0 & 0 & 1
\end{array}\right).
\,\!</math>|F.7}}
It is useful to note that the code can also be defined by the parity check matrix. Only the codewords are annihilated by the parity check matrix.

===The Disjointness Condition and Correcting Errors===

The motivation for the disjointness condition, [[#eqF.5|Eq.(F.5)]], is to associate each vector in the space with a particular code word. That is, assuming that only certain errors occur, each error vector should be associated to a particular vector in the code space when the error is added to the original code word. This partitions the set into disjoint subsets, with each containing only one code vector. A message is decoded correctly if the vector (the one containing the error) is in the subset that is associated with the original vector (the one with no error). For example, if one vector is sent, say <math> v_1 \,\!</math>, and an error occurs during transmission to produce <math> v_2 = v_1 +e\,\!</math>, then this vector must be in the subset containing <math> v_1 \,\!</math>.

A way to decode is to record an array of possible code words, possible errors, and the combinations of those errors and code words. The array can be set up as a top row of the code word vectors and a leftmost column of errors, with the element of the first row and the first column being the zero vector and all subsequent entries in the column being errors. Then the element at the top of a column (say the jth column) is added to the error in the corresponding row (say the kth row) to get the j,k entry of the array. With this array one can associate a column with a subset that is disjoint with the other sets. Identifying the erred code word in a column associates it with a code word and thus corrects the error.

===The Hamming Bound===

The Hamming bound is a bound that restricts the rate of the code. Due to the disjointness condition, a certain number of bits are required to ensure our ability to detect and correct errors. Suppose there is a set of <math> n\,\!</math> bit vectors for encoding <math> k\,\!</math> bits of information. There is a set of error vectors of weight <math> t \,\!</math> that has <math> C(n,t)\,\!</math> elements<ref>That is, <math> n \,\!</math> choose <math> t \,\!</math> vectors. The notation is <math> C(n,t) = {n\choose t} = \frac{n!}{(n-t)!t!}.\,\!</math></ref>. So the number of error vectors, including errors of weight up to <math> t \,\!</math>, is
<math> \sum_{i=0}^t C(n,i). \,\!</math> (Note that no error is also part of the set of error vectors. The objective is to be able to design a code that can correct all errors up to those of weight <math> t \,\!</math>, and this includes no error at all.) Since there are <math> 2^n\,\!</math> vectors in the whole space of <math> n\,\!</math> bits, and assuming <math> m\,\!</math> vectors are used for the encoding, the Hamming bound is
{{Equation|<math>
m\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.8}}
For linear codes, <math> m=2^k,\,\!</math> so
{{Equation|<math>
2^k\sum_{i=0}^t C(n,i) \leq 2^n.
\,\!</math>|F.9}}
Taking the logarithm,
{{Equation|<math>
k \leq n - \log_2\left(\sum_{i=0}^t C(n,i)\right).
\,\!</math>|F.10}}
For large <math> n, k \,\!</math> and <math> t \,\!</math>, we can use [[#LoPopescueSpiller|Stirling's formula]] to show that
{{Equation|<math>
\frac{k}{n} \leq 1 - H\left(\frac{t}{n}\right),
\,\!</math>|F.11}}
where <math> H(x) = -x\log x -(1-x)\log (1-x) \,\!</math> and we have neglected an overall multiplicative constant that goes to 1 as <math> n\rightarrow \infty. \,\!</math> (Again, see the article in [[Bibliography#LoPopescueSpiller|Lo, Popescu, and Spiller [26]]] by Steane.)

===More Definitions===

====Definition 11: Dual Code====

Let <math>\mathcal{C}\,\!</math> be a code and let <math>v\,\!</math> be a vector in the code space. The '''dual code''', denoted <math>\mathcal{C}^\perp\,\!</math>, is the set of all vectors that have zero inner product with all <math>v\in \mathcal{C}\,\!</math>. In other words, it is the set of all vectors <math>u\,\!</math> such that <math>u\cdot v = 0\,\!</math> for all <math>v\in \mathcal{C}\,\!</math>.

For binary vectors, a vector can be orthogonal to itself. Note that this is different from ordinary vectors in 3-d space.

The dual code is a useful entity in classical error correction and will be used in the construction of the quantum error correcting codes known as [[Chapter 7 - Quantum Error Correcting Codes#CSS codes|CSS codes]].

===Final Comments===

As can be seen from the Hamming bound, there is a limit to the rate of an error correcting code. This does not indicate whether or not codes that satisfy these bounds exist, but it does tell us that no codes exist that do not satisfy these bounds. Encoding, decoding, error detection and correction are all difficult problems to solve in general. One of the advantages of the linear codes is that they provide a systematic method for identifying errors on a code through the use of the parity check operation. More generally, checking to see whether or not a bit string (vector) is in the code space would require a look-up table. This would be much more time-consuming than using the parity check matrix; matrix multiplication is quite efficient relative to the look-up table.

Many of these ideas and definitions will be utilized in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]] on quantum error correction. Some linear codes, including the Hamming code above, will have quantum analogues---as do many quantum error correcting codes. In quantum computers, as will be discussed, error correction is necessary due to the delicacy of quantum information. Such discussions will be taken up in [[Chapter 7 - Quantum Error Correcting Codes|Chapter 7]].

==Footnotes==
<references />

Appendix A - Basic Probability Concepts

2013-03-03T18:36:41Z

Ddghunter:

In this appendix definitions and some example calculations are
presented which will aid in our discussions. This is not meant to be
a comprehensive introduction to the topic. It is primarily meant to
serve as a means for introducing notation and terminology for the
course.

By definition, probability is the chance of a certain event occurring from a set of events that could possibly occur. Let us start with the most primitive example of a probability, flipping a coin. Now we know the set of possible outcomes is heads or tails, <math>S=\left\{H,T\right\}.\,\!</math> Now since there are only two events that can occur and we know that there is an equal chance for them both to occur, we say that the probability for each occurring is <math>1/2,\,\!</math> i.e. <math>P(H)=1/2\,\!</math> and <math>P(T)=1/2,\,\!</math> because the probabilities for every possible outcome of an event must equal <math>1,\,\!</math> i.e. <math>P(H)+P(T)=1.\,\!</math>

In probability, the Boolean operator '''''and''''' can be somewhat counter intuitive at first. For instance, if someone were to tell you that he/she has '''5''' apples '''''and''''' just received '''3''' more, the operation that takes place in your head is he/she has <math>5 + 3 = 8\,\!</math> apples. But, when working with probabilities, the Boolean '''''and''''' corresponds with multiplication. For example, say the probability that Bob stays and works through his lunch hour is <math>1/6\,\!</math> and the probability that Kathy stays and works through lunch is <math>5/6.\,\!</math> Now if I were to ask, "What is the probability that Bob '''''and''''' Kathy stay and work through lunch?", you would not want add the probabilities because <math>P(B)+P(K)=1.\,\!</math> This would imply that both will work through lunch, which doesn't make sense because we cannot guarantee, from the knowledge that we have, both will work through lunch. Instead, let us multiply their respective probabilities, <math>P(B)*P(K)=5/36.\,\!</math> Since the answer is lower than the probability for each individual, it makes much more sense because, intuitively, the more uncertainty (i.e. more probabilities < 1) in a system, the more uncertain we are of success.

Now that we have examined the Boolean '''''and''''', lets take a look at '''''or'''''. '''''Or''''' corresponds with addition, which follows directly from the condition that all probabilities for the outcomes of events must add up to <math>1.\,\!</math> Revisiting the example of flipping a coin, we see that the two possible outcomes that occur are you obtain heads '''''or''''' you obtain tails. <math>P(H)+P(T)=1.\,\!</math>

(This example is a variation of one given by David Griffiths in ''Introduction to Quantum Mechanics'' ([[Bibliography#Griffiths:qmbook|David J. Griffiths’ book [4]]]))

''Example'': Suppose that in some room, there are four people with the following heights:
#1 person is '''1.5''' meters tall
#1 person is '''1.6''' meters tall
#2 people are '''1.8''' meters tall
Let<math>N\,\!</math> stand for the total number of people. We might write the number of people with certain heights as
<math>N(1.5) = 1\,\!</math>, <math>N(1.6)=1\,\!</math>, <math>N(1.8)=2\,\!</math>.
<center>The total number of people is</center>
<center><math>
N = \sum_{j=0}^\infty N(j),
\,\!</math></center>
<center>where <math>j\,\!</math> runs over all values. It is easily seen that <math>N=4\,\!</math>.</center>

Now if I draw a name out of a hat that contains each person's name
once, I will get the name of a person who is 1.6 meters tall with
probability <math>1/4\,\!</math>. (We assume that each person has a unique name and
that it appears once and only once in the hat.) We write this as
<center><math>
P(1.6) = 1/4
\,\!</math></center>
and we would generally write for any value
<center><math>
P(j) = \frac{N(j)}{N}.
\,\!</math></center>
Now since we are going to get someone's name when we draw, we must
have
<center><math>
\sum_j P(j) = 1,
\,\!</math></center>
which is easy enough to check.

There are several aspects of this probability distribution that we might like to know. Here are some that are particularly useful: 
#The ''most probable'' value (or ''mode'') for the height is 1.8 meters.
#The ''median'' is 1.7 meters (two people above and two below).
#The ''average'' (or ''mean'') is given by
{{Equation|<math>\begin{align}\left\langle height\right\rangle &= \frac{1(1.5)+1(1.6)+2(1.8)}{4} \\ &= \frac{6.8}{4} = 1.7. \end{align}</math>|A.1}}
Note that the mean and the median do not have to be the same. If there is an odd number of values, the median is the middle number in the list; if even, it is the mean of the two middle values. It is mere coincidence that they are the same here.
The bracket, <math>\left\langle\cdot\right\rangle\,\!</math>, is the standard notation for finding the ''average value''
of a function. This is done by calculating
<center><math>\left\langle f(j)\right\rangle = \sum_{j=0}^\infty f(j)P(j).\,\!</math></center>
For the average this is just
<center><math>
\left\langle j\right\rangle = \sum_{j=0}^\infty jP(j)= \sum_{j=0}^\infty j\frac{N(j)}{N}. \,\!</math></center>

'''Note:''' The ''average value'' is called the ''expectation value''  in quantum mechanics. This can be
misleading because it is ''not'' the most probable, nor is it <nowiki>''what to expect.''</nowiki>

When one would like to discuss the properties of a particular probability distribution, describing it takes some effort. It is not enough to know the average, median, and most probable values; a lot of details of the probability distribution remain unknown to us if these are all we are given. What else would one like to know? Without describing it entirely, one may like to know more about the <nowiki>''shape''</nowiki> of the distribution. For example, how spread out is it?

The most important measure of this is the ''variance'', which is the ''standard deviation''  squared ( <math>\sigma^2\!</math> ). The variance is defined as (in terms of our variable <math>j\,\!</math>)
{{Equation|<math>\sigma^2 = \langle(\Delta j)^2\rangle, \,\!</math>|A.2}}
where <math>\Delta j = j -\langle j \rangle\,\!</math>. This can also be written as
{{Equation|<math>\sigma^2 = \langle j^2\rangle - \langle j \rangle^2.\,\!</math>|A.3}}

===Stirling's Formula===

For large <math>n \,\!</math>, the following approximation is quite useful:
<center><math>
n! \approx \sqrt{2\pi n} \; n^n e^{-n}.
\,\!</math></center>

Appendix A - Basic Probability Concepts

2013-03-03T17:59:59Z

Ddghunter:

In this appendix definitions and some example calculations are
presented which will aid in our discussions. This is not meant to be
a comprehensive introduction to the topic. It is primarily meant to
serve as a means for introducing notation and terminology for the
course.

By definition, probability is the chance of a certain event occurring from a set of events that could possibly occur. Let us start with the most primitive example of a probability, flipping a coin. Now we know the set of possible outcomes is heads or tails, <math>S=\left\{H,T\right\}.\,\!</math> Now since there are only two events that can occur and we know that there is an equal chance for them both to occur, we say that the probability for each occurring is '''1/2''', i.e. <math>P(H)=1/2\,\!</math> and <math>P(T)=1/2\,\!</math>, because the probabilities for every possible outcome of an event must equal '''1''', i.e. <math>P(H)+P(T)=1.\,\!</math>

In probability, the Boolean operator '''''and''''' can be somewhat counter intuitive at first. For instance, if someone were to tell you that he/she has 5 apples '''''and''''' just received 3 more, the operation that takes place in your head is he/she has <math>5 + 3 = 8\,\!</math> apples. But, when working with probabilities, the Boolean '''''and''''' corresponds with multiplication. For example, say the probability that Bob stays and works through his lunch hour is '''1/6''' and the probability that Kathy stays and works through lunch is '''5/6'''. Now if I were to ask, "What is the probability that Bob '''''and''''' Kathy stay and work through lunch?", you would not want add the probabilities because <math>P(B)+P(K)=1.\,\!</math> This would imply that both will work through lunch, which doesn't make sense because we cannot guarantee, from the knowledge that we have, both will work through lunch. Instead, let us multiply their respective probabilities, <math>P(B)*P(K)=5/36.\,\!</math> Since the answer is lower than the probability for each individual, it makes much more sense because, intuitively, the more uncertainty (i.e. more probabilities < 1) in a system, the more uncertain we are of success.

Now that we have examined the Boolean '''''and''''', lets take a look at '''''or'''''. '''''Or''''' corresponds with addition, which follows directly from the condition that all probabilities for the outcomes of events must add up to '''1'''. Revisiting the example of flipping a coin, we see that the two possible outcomes that occur are you obtain heads '''''or''''' you obtain tails. <math>P(H)+P(T)=1.\,\!</math>

(This example is a variation of one given by David Griffiths in ''Introduction to Quantum Mechanics'' ([[Bibliography#Griffiths:qmbook|David J. Griffiths’ book [4]]]))

''Example'': Suppose that in some room, there are four people with the following heights:
#1 person is 1.5 meters tall
#1 person is 1.6 meters tall
#2 people are 1.8 meters tall
Let<math>N\,\!</math> stand for the total number of people. We might write the number of people with certain heights as
<math>N(1.5) = 1\,\!</math>, <math>N(1.6)=1\,\!</math>, <math>N(1.8)=2\,\!</math>.
<center>The total number of people is</center>
<center><math>
N = \sum_{j=0}^\infty N(j),
\,\!</math></center>
<center>where <math>j\,\!</math> runs over all values. It is easily seen that <math>N=4\,\!</math>.</center>

Now if I draw a name out of a hat that contains each person's name
once, I will get the name of a person who is 1.6 meters tall with
probability <math>1/4\,\!</math>. (We assume that each person has a unique name and
that it appears once and only once in the hat.) We write this as
<center><math>
P(1.6) = 1/4
\,\!</math></center>
and we would generally write for any value
<center><math>
P(j) = \frac{N(j)}{N}.
\,\!</math></center>
Now since we are going to get someone's name when we draw, we must
have
<center><math>
\sum_j P(j) = 1,
\,\!</math></center>
which is easy enough to check.

There are several aspects of this probability distribution that we might like to know. Here are some that are particularly useful: 
#The ''most probable'' value (or ''mode'') for the height is 1.8 meters.
#The ''median'' is 1.7 meters (two people above and two below).
#The ''average'' (or ''mean'') is given by
{{Equation|<math>\begin{align}\left\langle height\right\rangle &= \frac{1(1.5)+1(1.6)+2(1.8)}{4} \\ &= \frac{6.8}{4} = 1.7. \end{align}</math>|A.1}}
Note that the mean and the median do not have to be the same. If there is an odd number of values, the median is the middle number in the list; if even, it is the mean of the two middle values. It is mere coincidence that they are the same here.
The bracket, <math>\left\langle\cdot\right\rangle\,\!</math>, is the standard notation for finding the ''average value''
of a function. This is done by calculating
<center><math>\left\langle f(j)\right\rangle = \sum_{j=0}^\infty f(j)P(j).\,\!</math></center>
For the average this is just
<center><math>
\left\langle j\right\rangle = \sum_{j=0}^\infty jP(j)= \sum_{j=0}^\infty j\frac{N(j)}{N}. \,\!</math></center>

'''Note:''' The ''average value'' is called the ''expectation value''  in quantum mechanics. This can be
misleading because it is ''not'' the most probable, nor is it <nowiki>''what to expect.''</nowiki>

When one would like to discuss the properties of a particular probability distribution, describing it takes some effort. It is not enough to know the average, median, and most probable values; a lot of details of the probability distribution remain unknown to us if these are all we are given. What else would one like to know? Without describing it entirely, one may like to know more about the <nowiki>''shape''</nowiki> of the distribution. For example, how spread out is it?

The most important measure of this is the ''variance'', which is the ''standard deviation''  squared ( <math>\sigma^2\!</math> ). The variance is defined as (in terms of our variable <math>j\,\!</math>)
{{Equation|<math>\sigma^2 = \langle(\Delta j)^2\rangle, \,\!</math>|A.2}}
where <math>\Delta j = j -\langle j \rangle\,\!</math>. This can also be written as
{{Equation|<math>\sigma^2 = \langle j^2\rangle - \langle j \rangle^2.\,\!</math>|A.3}}

===Stirling's Formula===

For large <math>n \,\!</math>, the following approximation is quite useful:
<center><math>
n! \approx \sqrt{2\pi n} \; n^n e^{-n}.
\,\!</math></center>

Appendix A - Basic Probability Concepts

2013-03-03T17:55:01Z

Ddghunter:

In this appendix definitions and some example calculations are
presented which will aid in our discussions. This is not meant to be
a comprehensive introduction to the topic. It is primarily meant to
serve as a means for introducing notation and terminology for the
course.

By definition, probability is the chance of a certain event occurring from a set of events that could possibly occur. Let us start with the most primitive example of a probability, flipping a coin. Now we know the set of possible outcomes is heads or tails, <math>S=\left\{H,T\right\}.\,\!</math> Now since there are only two events that can occur and we know that there is an equal chance for them both to occur, we say that the probability for each occurring is '''1/2''', i.e. <math>P(H)=1/2\,\!</math> and <math>P(T)=1/2\,\!</math>, because the probabilities for every possible outcome of an event must equal '''1''', i.e. <math>P(H)+P(T)=1.\,\!</math>

In probability, the Boolean operator '''''and''''' can be somewhat counter intuitive at first. For instance, if someone were to tell you that he/she has 5 apples '''''and''''' just received 3 more, the operation that takes place in your head is he/she has <math>5 + 3 = 8\,\!</math> apples. But, when working with probabilities, the Boolean '''''and''''' corresponds with multiplication. For example, say the probability that Bob stays and works through his lunch hour is '''1/6''' and the probability that Kathy stays and works through lunch is '''5/6'''. Now if I were to ask, "What is the probability that Bob '''''and''''' Kathy stay and work through lunch?", you would not want add the probabilities because <math>P(B)+P(K)=1.\,\!</math> This would imply that both will work through lunch, which doesn't make sense because we cannot guarantee, from the knowledge that we have, both will work through lunch. Instead, let us multiply their respective probabilities, <math>P(B)*P(K)=5/36.\,\!</math> Since the answer is lower than the probability for each individual, it makes much more sense because, intuitively, the more uncertainty (i.e. more probabilities < 1) in a system, the more uncertain we are of success.

Now that we have examined the Boolean '''''and''''', lets take a look at '''''or'''''. '''''Or''''' corresponds with addition, which follows directly from the condition that all probabilities for the outcomes of events must add up to '''1'''. Revisiting the example of flipping a coin, we see that the two possible outcomes that occur are you obtain heads '''''or''''' you obtain tails. <math>P(H)+P(T)=1.\,\!</math>

(This example is a variation of one given by David Griffiths in ''Introduction to Quantum Mechanics'' ([[Bibliography#Griffiths:qmbook|David J. Griffiths’ book [4]]]))

''Example'': Suppose that in some room, there are four people with the following heights:
#1 person is 1.5 meters tall
#1 person is 1.6 meters tall
#2 people are 1.8 meters tall
Let<math>N\,\!</math> stand for the total number of people. We might write the number of people with certain heights as
<math>N(1.5) = 1\,\!</math>, <math>N(1.6)=1\,\!</math>, <math>N(1.8)=2\,\!</math>.
<center>The total number of people is</center>
<center><math>
N = \sum_{j=0}^\infty N(j),
\,\!</math></center>
<center>where <math>j\,\!</math> runs over all values. It is easily seen that <math>N=4\,\!</math>.</center>

Now if I draw a name out of a hat that contains each person's name
once, I will get the name of a person who is 1.6 meters tall with
probability <math>1/4\,\!</math>. (We assume that each person has a unique name and
that it appears once and only once in the hat.) We write this as
<center><math>
P(1.6) = 1/4
\,\!</math></center>
and we would generally write for any value
<center><math>
P(j) = \frac{N(j)}{N}.
\,\!</math></center>
Now since we are going to get someone's name when we draw, we must
have
<center><math>
\sum_j P(j) = 1,
\,\!</math></center>
which is easy enough to check.

There are several aspects of this probability distribution that we might like to know. Here are some that are particularly useful: 
#The ''most probable'' value for the height is 1.8 meters.
#The ''median'' is 1.7 meters (two people above and two below).
#The ''average'' (or ''mean'') is given by
{{Equation|<math>\begin{align}\left\langle height\right\rangle &= \frac{1(1.5)+1(1.6)+2(1.8)}{4} \\ &= \frac{6.8}{4} = 1.7. \end{align}</math>|A.1}}
Note that the mean and the median do not have to be the same. If there is an odd number of values, the median is the middle number in the list; if even, it is the mean of the two middle values. It is mere coincidence that they are the same here.
The bracket, <math>\left\langle\cdot\right\rangle\,\!</math>, is the standard notation for finding the ''average value''
of a function. This is done by calculating
<center><math>\left\langle f(j)\right\rangle = \sum_{j=0}^\infty f(j)P(j).\,\!</math></center>
For the average this is just
<center><math>
\left\langle j\right\rangle = \sum_{j=0}^\infty jP(j)= \sum_{j=0}^\infty j\frac{N(j)}{N}. \,\!</math></center>

'''Note:''' The ''average value'' is called the ''expectation value''  in quantum mechanics. This can be
misleading because it is ''not'' the most probable, nor is it <nowiki>''what to expect.''</nowiki>

When one would like to discuss the properties of a particular probability distribution, describing it takes some effort. It is not enough to know the average, median, and most probable values; a lot of details of the probability distribution remain unknown to us if these are all we are given. What else would one like to know? Without describing it entirely, one may like to know more about the <nowiki>''shape''</nowiki> of the distribution. For example, how spread out is it?

The most important measure of this is the ''variance'', which is the ''standard deviation''  squared ( <math>\sigma^2\!</math> ). The variance is defined as (in terms of our variable <math>j\,\!</math>)
{{Equation|<math>\sigma^2 = \langle(\Delta j)^2\rangle, \,\!</math>|A.2}}
where <math>\Delta j = j -\langle j \rangle\,\!</math>. This can also be written as
{{Equation|<math>\sigma^2 = \langle j^2\rangle - \langle j \rangle^2.\,\!</math>|A.3}}

===Stirling's Formula===

For large <math>n \,\!</math>, the following approximation is quite useful:
<center><math>
n! \approx \sqrt{2\pi n} \; n^n e^{-n}.
\,\!</math></center>

Appendix A - Basic Probability Concepts

2013-03-03T16:13:08Z

Ddghunter:

Chapter 1 - Introduction

2012-12-24T19:24:55Z

Ddghunter: /* Obstacles to Building a Reliable Quantum Computer */

===Introduction===

''In science one tries to tell people, in such a way as to be understood by everyone, something that no one ever knew before. But in poetry, it's the exact opposite.''

-Paul Dirac

===An Introduction to Quantum Computation===

This introductory chapter is a survey of, and introduction to, topics in quantum information
processing. All of these topics (and more) will be revisited in later sections. Therefore,
it is not necessary, nor expected, that the reader will feel the subjects have been completely explained in this
introductory material. Furthermore, if one has some background in quantum computing, this chapter may be skipped.

====Quantum Mechanics====

So what is quantum mechanics? We should think of it as a set of rules, in some ways
similar to Newton’s laws, which describe the way the world works. These are the rules
to which we must carefully attend in order to build what we will describe as a quantum
computing device. We will return to this topic briefly again later. However, as is done in
many places, this question is never quite answered directly. Most often we simply learn the
rules and how to use them. The question itself is perhaps a little vague because there are many
physical systems that don’t quite fit into an either/or categorization of quantum vs. classical (since, as stated already, classical mechanics is an approximation to quantum mechanics).
Also, it should be noted that throughout these notes the terms will be somewhat misused, in
the sense that certain systems will be called quantum mechanical or classical, and from now
on, with few exceptions, no care will be taken to discuss subtleties.

====Quantum Computing and Quantum Information Processing====

A quantum computer would be a computer that would take advantage of quantum mechanical
evolutions according to which physical systems behave. We often think of quantum mechanics
as being the set of mechanical laws or principles that only very small particles obey. While this
is not really true, it is a somewhat reasonable way of explaining things to the layman since the world of the "small" is the world where these laws are most often used and were discovered. For our
purposes, we should note that everything obeys the laws of quantum mechanics and that
Newtonian mechanics are rules that we use to approximate quantum mechanics. However,
quantum mechanical control and natural quantum mechanical evolution, which cannot be approximated by Newton's laws,
are what we are talking about when we talk about quantum systems. We must have very particular quantum mechanical
evolution, which cannot be reasonably approximated with classical mechanics, and use it in a
particular way to really perform a quantum computation or to really do quantum information
processing.

We have not yet built a quantum computing device. However, there are many reasons
to study quantum information processing other than building a fully functional quantum
computer. One main reason we haven’t built one is that we have to figure out how. The
experiments to perform quantum computation in physical devices take an enormous amount
of effort due to noises which corrupt the information. We are going to need to fix the
corrupted information, avoid the noises, or do away with them by some other means. A
reason to study quantum computing, and quantum information processing more
generally, is that there are really many quantum information processing tasks, or tasks
which can be thought of in this way, which concern quantum control. Precise control of
a quantum system is important for a variety of reasons, not the least of which is that our
world is quantum mechanical! When we get right down to the very basic elements of the
universe, they behave quantum mechanically. If there is one thing that the study of quantum
information processing has already taught us, it's that we need to pay attention to quantum
mechanics because it can be very useful to be able to manipulate quantum systems and take
advantage of uniquely quantum properties. Quantum technologies are going to be extremely
important in the future, even if we never built a quantum computer. (Oh, but we will!) As
Feynman said, “There is plenty of room at the bottom.” We have a lot to discover about
the world of the small.

Since noise has been, and is still, such a problem for quantum information, we need to deal with it. People quickly recognized this problem, and Peter Shor, and others, really made remarkable statements with their work on quantum error correcting codes. Their work showed that errors could, in principle, be corrected, leading the way for future research since it was then plausible that a quantum computer could be built – there are no fundamental obstacles. However, quantum error correcting codes are, in some sense, a software solution to a hardware problem. More physical treatments include codes which avoid errors, and control methods which are designed to average noises away. However, an all-out attack will include several different methods of error prevention used together. Error prevention methods are the subject of the last part of this course/book.

====Motivation====
Why do we want to build a quantum computing device?

#To make computers faster and more compact, we have been making them smaller (This has obeyed Moore’s law, see, [[Bibliography#Moore'sLaw:article|Moore's Law article [16]]]). However, there is a limit to how much smaller we can make them and still have them function as they do now. This is due to quantum mechanics. In other words, the limit to small scale computational technology is governed by quantum mechanics, since, at a certain scale, the current computational systems will not be able to be approximated by Newtonian mechanics. So, to make things smaller, we need to use quantum mechanics! More than this though, the fact that Moore’s law cannot continue indefinitely means that we will need to look elsewhere for advances in computing power. One way to increase computing power is to use parallel computations. However, there are processes which cannot be parallel. So where do we turn? A quantum computer would help with this.
#We now know of several different quantum algorithms which are faster than any known classical algorithm for performing the same task. Some are actually provably faster. These are listed and discussed further in the next section.
#Quantum information can be used in a variety of ways beyond computing. Such as quantum cryptography, quantum games, and quantum communication of all sorts.

An important point to take away from this section is that information is stored and
manipulated by physical devices. The way in which they behave is important for the tasks
that are to be performed.

====Specific Uses====
There are at least three advantages of quantum computing devices which are often quoted:

#Factor large integers more efficiently than a classical machine (known as Shor’s algorithm).
#Find an object in an unsorted database more efficiently than a classical machine (known as Govers algorithm).
#Simulate quantum mechanical systems more efficiently than any classical system (due to Feynman and others).

====COMMENTS====
Shor’s algorithm would render RSA encryption useless. It is more efficient than any
known classical algorithm. (There is a quantum answer to this problem however-quantum
cryptography through QKD.)

Gover’s algorithm is better than any classical algorithm – phone book example: classical
algorithm grows as <math>N/2\,\!</math> and Grover’s grows as <math>\sqrt{N}\,\!</math>.

Simulating quantum mechanical systems is quite difficult classically. For physical scientists
this could be the most important application of quantum computers. This could enable
the simulation of nuclear systems, solid-state devices, biological molecules and molecular
interactions, etc. much more efficiently than classical simulation. This would enable calculations which are practically impossible now.

====How do quantum computers provide an advantage?====
The claim is that quantum computers could solve some problems more efficiently than any
classical one. So viewing our information systems as quantum systems, we may note that
quantum mechanics is more than a description of the physical world (which is how physicists have
treated it for years) and is instead a set of rules governing the behaviour of information when stored
and manipulated quantum mechanically.

So the natural question is, “How does it do this?” We may also ask, “Where is the
advantage?” In other words, “What exactly about quantum mechanics enables us to achieve
speed-ups and other information processing tasks more efficiently than classical systems?”
Many people, as of the time of this writing, would likely say they don’t know. For example,
it is not known if there is a classical algorithm which could factor efficiently (By efficiently
here, let us just say that we mean “fewer resources,” and be more specific later). Or perhaps they would say <nowiki>"entanglement"</nowiki> is responsible for the apparent speedups. But this is a subject yet to be discussed. One can present an intuitive plausible argument for why we believe a quantum computer can accomplish things a classical one cannot. However, there is no claim of a proof of anything at this point.

The argument concerns the fact that when a given machine has a different set of rules for operating, we expect
it to be able to do different things. The rules by which classical computing machines function are, in
some real sense, different from the ones governing the behavior of quantum machines. This is
quite vague, especially given the earlier comments about how everything really is quantum
mechanical. One may think about it (but someone is sure to argue) as a “classical object”
transforms according to a “classical equation of motion” and the result is determined by
its initial state, which is “classical.” A quantum mechanical state transforms according to a
“quantum equation of motion” and the result of the evolution is determined by some initial
conditions, which describe a “quantum system.” Perhaps this sounds like a circular argument,
primarily involving semantics. However, the motivation for this as a definition comes from vector and tensor analysis: an object is a tensor if it transforms like a tensor. So we might say, an object is classical if it obeys classical equations. In practice, this is often the way things are done. If the physical system can be well-approximated using classical mechanics, we call it classical.

One can further argue that there are states which are uniquely quantum mechanical.
These are states which would have been mysterious to Newton, and indeed they were mysterious
to Einstein. Furthermore, they are still mysterious today! The important point is
that there is no known classical analogue for some quantum states and one can effectively argue that there can be no classical analogue.
They are unique to quantum mechanics. Some of these states are called entangled states. However,
there are things one can do using quantum systems using un-entangled states that we have no idea how to do using classical systems.

So the answer to the question, "How do quantum computers provide an advantage?" is that we don't really know.

Let us first discuss
bits and qubits. We will then discuss quantum states of many particles which correspond
to entangled states. Finally, we will revisit this notion of intuition behind the quantum
mechanical speed-ups.

===Bits and Qubits: An Introduction===
A ''classical bit'' is represented by two different states of a classical system. In classical computers
it is represented by two different values of an electrical potential difference. The two
different states of the system are represented by 0 and 1.

A ''quantum bit'' or ''qubit'' (better, but less often used is Qbit, see [[Bibliography#Mermin:qcbook|N. David Mermin's book [1]]]) is represented by two
states of a quantum mechanical system. The two different states are represented by <math>\left\vert{0}\right\rangle</math> and <math>\left\vert{1}\right\rangle</math>. This notation is common and is explained in some detail in Appendix C.2.2, [[Appendix C - Vectors and Linear Algebra#Complex Vectors|Complex Vectors]].
<center>
[[File:Doublewell.jpeg]]

Figure 1.1: This is a double well with a ball in one of the two wells.</center>
 
Let us discuss a way in which to think about the differences between classical and quantum
systems. We will consider two wells, or valleys, with a hill in between as in Fig. 1.1.
First we will consider a classical system and we will suppose there are no frictional forces.
If we start the ball rolling where it is in the figure, then it will roll back and forth in Well
0. (Well 0, or “Well zero” is our name for the well on the left-hand side.) It will never leave
Well 0 if we leave it alone. If we wanted it to go into Well 1 (the well on the right-hand side)
we would need to nudge it or push it a little to get it over the hill. Or we could just pick it
up and move it from one well to the other.

Now suppose the system is quantum mechanical.<ref>For those with a little background in physics, these are potential wells. An example is a ball in between two hills for the classical case. For the quantum case, we can think of a quantum particle in a potential well
with this shape and solve Schr¨odinger’s equation.</ref> In this case, if we set up the system
so that the particle initially has some kinetic energy (imagine a moving “quantum ball”),
and let it go, there is some probability, after some amount of time, that the particle will be
found in Well 1. This is true when the energy of the ball was not great enough to travel over
the hill in the classical analogy. The probability of it being found in the other well depends
on several things; the initial energy of the particle, the width of the hill, and the height of
the hill (equivalently the depth(s) of the wells, which could be different). However, it won’t
happen with a classical bit! So this is a difference between the classical and quantum mechanics.<ref> Now, if it is admitted that every particle is described by quantum mechanics, the the classically forbidden zone is forbidden because the probability of finding the ball there is extremely small (essentially zero).</ref> In
quantum mechanics, the particle is in some sense in both wells at the same time. This has
to do with the “wave” nature of quantum mechanics. We then say that the particle is in a
superposition of Well 0 and Well 1 at the same time. Mathematically, we describe these
different physical “states” or conditions of the system in the following way.
{{Equation|<math>\begin{align}\mbox{Particle is in Well } 0&=\left\vert{0}\right\rangle, \\ \mbox{Particle is in Well } 1&=\left\vert{1}\right\rangle\end{align}</math>|1.1}}
In other words, the state of the system is “the particle is in Well 0” is written mathematically as <math>\left\vert{0}\right\rangle</math>, and simiarly for <math>\left\vert{1}\right\rangle\,\!</math>. If the particle is in a superposition of the two, which will mean some probability for finding the particle in each well, we would write this as
{{Equation|<math>\left\vert{\psi}\right\rangle=\alpha\left\vert{0}\right\rangle+\beta\left\vert{1}\right\rangle\,\!</math>|1.2}}
where <math>\alpha\,\!</math> and <math>\beta\,\!</math> are complex numbers (see [[Appendix B]]) and the probability of the particle
being found in Well 0 is <math>|\alpha|^2\,\!</math> and the probability of it being found in Well 1 is <math>|\beta|^2\,\!</math>.

Now, some (physicists no less) have asked how to make a deterministic transformation
in a quantum system. After all, this seems to be probabilistic. The way to do that is the
following. We make the hill very wide and tall and we put the particle right down in the
bottom of one well and give it as little initial energy as possible. Then if we want it moved
to the other well, we pick it up and move it<ref>Again a note for physicists. If we cool it to its ground state and make sure we don’t have stray kicks that will knock it out, we achieve this. Then we put the right amount of energy to get it to transition to the first excited state.</ref>.

If we measure the system, i.e. look to see if it is Well 0 or Well 1, we will “project it into
one state or the other.” In other words, suppose the system is in the state <math>\left|\psi\right\rangle\,\!</math> above. If we
look to see where the particle is and find it in Well 1, then the probability is clearly zero
that it is in the other well. This is called the projection postulate in quantum mechanics and
we will see how to represent this mathematically later.

Throughout the notes, when trying to think about a physical qubit, this simple picture
is often helpful. Therefore, we will refer back to it from time to time.

===Obstacles to Building a Reliable Quantum Computer===

Noise is the greatest obstacle to building a quantum computer. This was also the case with
early electronic classical computing devices. In this case there is an intuitive explanation. In a quantum computation, a quantum system becomes entangled. Without going into detail, let us just say highly correlated is synonymous with entangled. (Entangled states are discussed in [[Chapter 4 - Entanglement|Chapter 4]].) Affecting one part of the system can affect another since two parts are highly correlated. Entanglement is believed by some to be responsible for the power of some quantum information processing tasks and there is evidence for this. However, the fact that these entangled systems are being used during the computation means that if a noise affects one part of the system, then other parts of the system are also affected. In this sense, quantum systems are very delicate and must be handled with care.

For our purposes, we will need to discuss ''open-system evolution'' and ''closed-system evolution''. A closed system is one which does not have any interaction with external objects. We may also refer to such a system as isolated. For example, one knows that if a jar has a very good lid on it, no liquid can leak out, or into, the jar. So if we put a certain amount of liquid in it now, we can expect it will all be there later. This is a closed system and the liquid is isolated from masses external to the jar. In other words, no other mass can get in or out.

A better example is what we call thermally isolated or a thermally closed system, meaning no heat energy is exchanged
with any other system. An open system is one which
can interact with its environment in some way. In these examples, a lid that is not sealed
can allow liquid vapor to escape and one that is not thermally isolated, or thermally closed
can heat up or cool down.

For the quantum information processing tasks we have in mind, we will consider quantum information which is isolated form its environment and what we usually mean is that the quantum system is isolated and cannot be affected by an outside source. It is important to note that isolated, or closed systems, are ideal. While they may often be good approximations to a system, they are never really completely isolated or closed. One may consider larger and larger systems to try to obtain a closed system, but this is most often impractical, although it can be useful for modeling. The fact that systems are never completely closed means that errors ''will'' creep into our quantum information processing, and we must find a way in which to deal with these errors in order to build reliable quantum information processing devices.

==Further Reading==

[[Bibliography#Mermin:qcbook|N. David Mermin's book [1]]] is a recent and excellent introductory text. [[Bibliography#NielsenChuang:book|Nielsen and Chuang's book [2]]] is also very good and has become somewhat of a standard reference. [[Bibliography#Preskill:notes|John Preskill's notes [5]]] are free to read and were part of the motivation for writing this book. They are quite thorough and even include exercises on his course page. One may also want to consult Quantiki's (http://www.quantiki.org/) excellent encyclopedia of quantum information http://www.quantiki.org/wiki/Main_Page. There is an article concerning introductory material, entitled [http://www.quantiki.org/wiki/Basic_concepts_in_quantum_computation Basic Concepts in Quantum Computation], and also tutorials on various topics.

==Footnotes==
<references />

[[Chapter 2 - Qubits and Collections of Qubits#Introduction|Continue to '''Chapter 2 - Qubits and Collections of Qubits''']]

Chapter 1 - Introduction

2012-12-24T19:24:11Z

Ddghunter: /* How do quantum computers provide an advantage? */

===Introduction===

''In science one tries to tell people, in such a way as to be understood by everyone, something that no one ever knew before. But in poetry, it's the exact opposite.''

-Paul Dirac

===An Introduction to Quantum Computation===

This introductory chapter is a survey of, and introduction to, topics in quantum information
processing. All of these topics (and more) will be revisited in later sections. Therefore,
it is not necessary, nor expected, that the reader will feel the subjects have been completely explained in this
introductory material. Furthermore, if one has some background in quantum computing, this chapter may be skipped.

====Quantum Mechanics====

So what is quantum mechanics? We should think of it as a set of rules, in some ways
similar to Newton’s laws, which describe the way the world works. These are the rules
to which we must carefully attend in order to build what we will describe as a quantum
computing device. We will return to this topic briefly again later. However, as is done in
many places, this question is never quite answered directly. Most often we simply learn the
rules and how to use them. The question itself is perhaps a little vague because there are many
physical systems that don’t quite fit into an either/or categorization of quantum vs. classical (since, as stated already, classical mechanics is an approximation to quantum mechanics).
Also, it should be noted that throughout these notes the terms will be somewhat misused, in
the sense that certain systems will be called quantum mechanical or classical, and from now
on, with few exceptions, no care will be taken to discuss subtleties.

====Quantum Computing and Quantum Information Processing====

A quantum computer would be a computer that would take advantage of quantum mechanical
evolutions according to which physical systems behave. We often think of quantum mechanics
as being the set of mechanical laws or principles that only very small particles obey. While this
is not really true, it is a somewhat reasonable way of explaining things to the layman since the world of the "small" is the world where these laws are most often used and were discovered. For our
purposes, we should note that everything obeys the laws of quantum mechanics and that
Newtonian mechanics are rules that we use to approximate quantum mechanics. However,
quantum mechanical control and natural quantum mechanical evolution, which cannot be approximated by Newton's laws,
are what we are talking about when we talk about quantum systems. We must have very particular quantum mechanical
evolution, which cannot be reasonably approximated with classical mechanics, and use it in a
particular way to really perform a quantum computation or to really do quantum information
processing.

We have not yet built a quantum computing device. However, there are many reasons
to study quantum information processing other than building a fully functional quantum
computer. One main reason we haven’t built one is that we have to figure out how. The
experiments to perform quantum computation in physical devices take an enormous amount
of effort due to noises which corrupt the information. We are going to need to fix the
corrupted information, avoid the noises, or do away with them by some other means. A
reason to study quantum computing, and quantum information processing more
generally, is that there are really many quantum information processing tasks, or tasks
which can be thought of in this way, which concern quantum control. Precise control of
a quantum system is important for a variety of reasons, not the least of which is that our
world is quantum mechanical! When we get right down to the very basic elements of the
universe, they behave quantum mechanically. If there is one thing that the study of quantum
information processing has already taught us, it's that we need to pay attention to quantum
mechanics because it can be very useful to be able to manipulate quantum systems and take
advantage of uniquely quantum properties. Quantum technologies are going to be extremely
important in the future, even if we never built a quantum computer. (Oh, but we will!) As
Feynman said, “There is plenty of room at the bottom.” We have a lot to discover about
the world of the small.

Since noise has been, and is still, such a problem for quantum information, we need to deal with it. People quickly recognized this problem, and Peter Shor, and others, really made remarkable statements with their work on quantum error correcting codes. Their work showed that errors could, in principle, be corrected, leading the way for future research since it was then plausible that a quantum computer could be built – there are no fundamental obstacles. However, quantum error correcting codes are, in some sense, a software solution to a hardware problem. More physical treatments include codes which avoid errors, and control methods which are designed to average noises away. However, an all-out attack will include several different methods of error prevention used together. Error prevention methods are the subject of the last part of this course/book.

====Motivation====
Why do we want to build a quantum computing device?

#To make computers faster and more compact, we have been making them smaller (This has obeyed Moore’s law, see, [[Bibliography#Moore'sLaw:article|Moore's Law article [16]]]). However, there is a limit to how much smaller we can make them and still have them function as they do now. This is due to quantum mechanics. In other words, the limit to small scale computational technology is governed by quantum mechanics, since, at a certain scale, the current computational systems will not be able to be approximated by Newtonian mechanics. So, to make things smaller, we need to use quantum mechanics! More than this though, the fact that Moore’s law cannot continue indefinitely means that we will need to look elsewhere for advances in computing power. One way to increase computing power is to use parallel computations. However, there are processes which cannot be parallel. So where do we turn? A quantum computer would help with this.
#We now know of several different quantum algorithms which are faster than any known classical algorithm for performing the same task. Some are actually provably faster. These are listed and discussed further in the next section.
#Quantum information can be used in a variety of ways beyond computing. Such as quantum cryptography, quantum games, and quantum communication of all sorts.

An important point to take away from this section is that information is stored and
manipulated by physical devices. The way in which they behave is important for the tasks
that are to be performed.

====Specific Uses====
There are at least three advantages of quantum computing devices which are often quoted:

#Factor large integers more efficiently than a classical machine (known as Shor’s algorithm).
#Find an object in an unsorted database more efficiently than a classical machine (known as Govers algorithm).
#Simulate quantum mechanical systems more efficiently than any classical system (due to Feynman and others).

====COMMENTS====
Shor’s algorithm would render RSA encryption useless. It is more efficient than any
known classical algorithm. (There is a quantum answer to this problem however-quantum
cryptography through QKD.)

Gover’s algorithm is better than any classical algorithm – phone book example: classical
algorithm grows as <math>N/2\,\!</math> and Grover’s grows as <math>\sqrt{N}\,\!</math>.

Simulating quantum mechanical systems is quite difficult classically. For physical scientists
this could be the most important application of quantum computers. This could enable
the simulation of nuclear systems, solid-state devices, biological molecules and molecular
interactions, etc. much more efficiently than classical simulation. This would enable calculations which are practically impossible now.

====How do quantum computers provide an advantage?====
The claim is that quantum computers could solve some problems more efficiently than any
classical one. So viewing our information systems as quantum systems, we may note that
quantum mechanics is more than a description of the physical world (which is how physicists have
treated it for years) and is instead a set of rules governing the behaviour of information when stored
and manipulated quantum mechanically.

So the natural question is, “How does it do this?” We may also ask, “Where is the
advantage?” In other words, “What exactly about quantum mechanics enables us to achieve
speed-ups and other information processing tasks more efficiently than classical systems?”
Many people, as of the time of this writing, would likely say they don’t know. For example,
it is not known if there is a classical algorithm which could factor efficiently (By efficiently
here, let us just say that we mean “fewer resources,” and be more specific later). Or perhaps they would say <nowiki>"entanglement"</nowiki> is responsible for the apparent speedups. But this is a subject yet to be discussed. One can present an intuitive plausible argument for why we believe a quantum computer can accomplish things a classical one cannot. However, there is no claim of a proof of anything at this point.

The argument concerns the fact that when a given machine has a different set of rules for operating, we expect
it to be able to do different things. The rules by which classical computing machines function are, in
some real sense, different from the ones governing the behavior of quantum machines. This is
quite vague, especially given the earlier comments about how everything really is quantum
mechanical. One may think about it (but someone is sure to argue) as a “classical object”
transforms according to a “classical equation of motion” and the result is determined by
its initial state, which is “classical.” A quantum mechanical state transforms according to a
“quantum equation of motion” and the result of the evolution is determined by some initial
conditions, which describe a “quantum system.” Perhaps this sounds like a circular argument,
primarily involving semantics. However, the motivation for this as a definition comes from vector and tensor analysis: an object is a tensor if it transforms like a tensor. So we might say, an object is classical if it obeys classical equations. In practice, this is often the way things are done. If the physical system can be well-approximated using classical mechanics, we call it classical.

One can further argue that there are states which are uniquely quantum mechanical.
These are states which would have been mysterious to Newton, and indeed they were mysterious
to Einstein. Furthermore, they are still mysterious today! The important point is
that there is no known classical analogue for some quantum states and one can effectively argue that there can be no classical analogue.
They are unique to quantum mechanics. Some of these states are called entangled states. However,
there are things one can do using quantum systems using un-entangled states that we have no idea how to do using classical systems.

So the answer to the question, "How do quantum computers provide an advantage?" is that we don't really know.

Let us first discuss
bits and qubits. We will then discuss quantum states of many particles which correspond
to entangled states. Finally, we will revisit this notion of intuition behind the quantum
mechanical speed-ups.

===Bits and Qubits: An Introduction===
A ''classical bit'' is represented by two different states of a classical system. In classical computers
it is represented by two different values of an electrical potential difference. The two
different states of the system are represented by 0 and 1.

A ''quantum bit'' or ''qubit'' (better, but less often used is Qbit, see [[Bibliography#Mermin:qcbook|N. David Mermin's book [1]]]) is represented by two
states of a quantum mechanical system. The two different states are represented by <math>\left\vert{0}\right\rangle</math> and <math>\left\vert{1}\right\rangle</math>. This notation is common and is explained in some detail in Appendix C.2.2, [[Appendix C - Vectors and Linear Algebra#Complex Vectors|Complex Vectors]].
<center>
[[File:Doublewell.jpeg]]

Figure 1.1: This is a double well with a ball in one of the two wells.</center>
 
Let us discuss a way in which to think about the differences between classical and quantum
systems. We will consider two wells, or valleys, with a hill in between as in Fig. 1.1.
First we will consider a classical system and we will suppose there are no frictional forces.
If we start the ball rolling where it is in the figure, then it will roll back and forth in Well
0. (Well 0, or “Well zero” is our name for the well on the left-hand side.) It will never leave
Well 0 if we leave it alone. If we wanted it to go into Well 1 (the well on the right-hand side)
we would need to nudge it or push it a little to get it over the hill. Or we could just pick it
up and move it from one well to the other.

Now suppose the system is quantum mechanical.<ref>For those with a little background in physics, these are potential wells. An example is a ball in between two hills for the classical case. For the quantum case, we can think of a quantum particle in a potential well
with this shape and solve Schr¨odinger’s equation.</ref> In this case, if we set up the system
so that the particle initially has some kinetic energy (imagine a moving “quantum ball”),
and let it go, there is some probability, after some amount of time, that the particle will be
found in Well 1. This is true when the energy of the ball was not great enough to travel over
the hill in the classical analogy. The probability of it being found in the other well depends
on several things; the initial energy of the particle, the width of the hill, and the height of
the hill (equivalently the depth(s) of the wells, which could be different). However, it won’t
happen with a classical bit! So this is a difference between the classical and quantum mechanics.<ref> Now, if it is admitted that every particle is described by quantum mechanics, the the classically forbidden zone is forbidden because the probability of finding the ball there is extremely small (essentially zero).</ref> In
quantum mechanics, the particle is in some sense in both wells at the same time. This has
to do with the “wave” nature of quantum mechanics. We then say that the particle is in a
superposition of Well 0 and Well 1 at the same time. Mathematically, we describe these
different physical “states” or conditions of the system in the following way.
{{Equation|<math>\begin{align}\mbox{Particle is in Well } 0&=\left\vert{0}\right\rangle, \\ \mbox{Particle is in Well } 1&=\left\vert{1}\right\rangle\end{align}</math>|1.1}}
In other words, the state of the system is “the particle is in Well 0” is written mathematically as <math>\left\vert{0}\right\rangle</math>, and simiarly for <math>\left\vert{1}\right\rangle\,\!</math>. If the particle is in a superposition of the two, which will mean some probability for finding the particle in each well, we would write this as
{{Equation|<math>\left\vert{\psi}\right\rangle=\alpha\left\vert{0}\right\rangle+\beta\left\vert{1}\right\rangle\,\!</math>|1.2}}
where <math>\alpha\,\!</math> and <math>\beta\,\!</math> are complex numbers (see [[Appendix B]]) and the probability of the particle
being found in Well 0 is <math>|\alpha|^2\,\!</math> and the probability of it being found in Well 1 is <math>|\beta|^2\,\!</math>.

Now, some (physicists no less) have asked how to make a deterministic transformation
in a quantum system. After all, this seems to be probabilistic. The way to do that is the
following. We make the hill very wide and tall and we put the particle right down in the
bottom of one well and give it as little initial energy as possible. Then if we want it moved
to the other well, we pick it up and move it<ref>Again a note for physicists. If we cool it to its ground state and make sure we don’t have stray kicks that will knock it out, we achieve this. Then we put the right amount of energy to get it to transition to the first excited state.</ref>.

If we measure the system, i.e. look to see if it is Well 0 or Well 1, we will “project it into
one state or the other.” In other words, suppose the system is in the state <math>\left|\psi\right\rangle\,\!</math> above. If we
look to see where the particle is and find it in Well 1, then the probability is clearly zero
that it is in the other well. This is called the projection postulate in quantum mechanics and
we will see how to represent this mathematically later.

Throughout the notes, when trying to think about a physical qubit, this simple picture
is often helpful. Therefore, we will refer back to it from time to time.

===Obstacles to Building a Reliable Quantum Computer===

Noise is the greatest obstacle to building a quantum computer. This was also the case with
early electronic classical computing devices. In this case there is an intuitive explanation. In a quantum computation, a quantum system becomes entangled. Without going into detail, let us just say highly correlated is synonymous with entangled. (Entangled states are discussed in [[Chapter 4 - Entanglement|Chapter 4]].) Affecting one part of the system can affect another since two parts are highly correlated. Entanglement is believed by some to be responsible for the power of some quantum information processing tasks and there is evidence for this. However, the fact that these entangled systems are being used during the computation means that if a noise affects one part of the system, then other parts of the system are also affected. In this sense, quantum systems are very delicate and must be handled with care.

For our purposes, we will need to discuss ''open-system evolution'' and ''closed-system evolution''. A closed system is one which does not have any interaction with external objects. We may also refer to such a system as isolated. For example, one knows that if a jar has a very good lid on it, no liquid can leak out, or into, the jar. So if we put a certain amount of liquid in it now, we can expect it will all be there later. This is a closed system and the liquid is isolated from masses external to the jar. In other words, no other mass can get in or out.

A better example is what we call thermally isolated or a thermally closed system, meaning no heat energy is exchanged
with any other system. An open system is one which
can interact with its environment in some way. In these examples, a lid that is not sealed
can allow liquid vapor to escape and one that is not thermally isolated, or thermally closed
can heat up or cool down.

For the quantum information processing tasks we have in mind, we will consider quantum information which is isolated form its environment and what we usually mean is that the quantum system is isolated and cannot be affected by an outside source. It is important to note that isolated, or closed systems, are ideal. While they may often be good approximations to a system, they are basically never really completely isolated or closed. One may consider larger and larger systems to try to obtain a closed system, but this is most often impractical, although it can be useful for modeling. The fact that systems are never completely closed means that errors ''will'' creep into our quantum information processing, and we must find a way in which to deal with these errors in order to build reliable quantum information processing devices.

==Further Reading==

[[Bibliography#Mermin:qcbook|N. David Mermin's book [1]]] is a recent and excellent introductory text. [[Bibliography#NielsenChuang:book|Nielsen and Chuang's book [2]]] is also very good and has become somewhat of a standard reference. [[Bibliography#Preskill:notes|John Preskill's notes [5]]] are free to read and were part of the motivation for writing this book. They are quite thorough and even include exercises on his course page. One may also want to consult Quantiki's (http://www.quantiki.org/) excellent encyclopedia of quantum information http://www.quantiki.org/wiki/Main_Page. There is an article concerning introductory material, entitled [http://www.quantiki.org/wiki/Basic_concepts_in_quantum_computation Basic Concepts in Quantum Computation], and also tutorials on various topics.

==Footnotes==
<references />

[[Chapter 2 - Qubits and Collections of Qubits#Introduction|Continue to '''Chapter 2 - Qubits and Collections of Qubits''']]

Chapter 13 - Topological Quantum Error Correction

2012-11-19T23:14:59Z

Ddghunter: /* Surface Code */

== Surface Code ==
===Introduction===

Surface codes are topological quantum error-correcting codes in which we can think of qubits being arranged on a 2-D lattice of qubits with only nearest neighbor interactions. This in practice may prove to be a very useful feature, since for many systems interacting qubits that are close to each other is substantially less difficult than ones that are further apart. We can think of physical qubits as being arranged on the edges of a lattice as shown in Figure 1. An example of the surface code are toric code and planar code, the main difference between both of them is the boundary condition. In the toric code, the boundaries are periodic whereas in the case of the planar code, the boundaries are not periodic. In the toric code, the qubits are arranged on a lattice which can be thought of as spread over a surface of a torus, and in a planar code case we think of the data qubits as living on a simple 2-D plane and ancilla qubits on the faces and the intersections.
<center>[[File:lattice.jpg]]</center>
<center>'''Figure 1''' A two-dimensional array implementation of the surface code. Data qubits are open circles, measurements (ancilla) qubits are filled circles. The yellow area is to measure-Z qubits while the green area is to measure-X qubits.</center>

The stabilizer generators for the surface code are the tensor products of ''Z'' on the four data qubits around each face, and the tensor products of ''X'' on
the four data qubits around each intersection. Neighbouring stabilizers share two data qubits ensuring that adjacent ''X'' and ''Z'' stabilizers commute. The qubit ''Z'' eigenstate are called the ground state <math>\left\vert{g}\right\rangle</math> and the excited state <math>\left\vert{e}\right\rangle</math>. The ground state is the ''+1'' eigenstate of ''Z'', with <math>Z\left\vert{g}\right\rangle=+\left\vert{g}\right\rangle</math>, and the excited state is the ''-1'' eigenstate, with <math>Z\left\vert{e}\right\rangle=-\left\vert{e}\right\rangle</math>. It is tempting to think of the qubit as a kind of quantum transistor, with the ground state corresponding to "off" and the excited state to "on". However, in distinct contrast to a classical logical element, a qubit can exist in a superposition of its eigenstate, <math>\left\vert{\psi}\right\rangle=\alpha\left\vert{g}\right\rangle+\beta\left\vert{g}\right\rangle</math>, so a qubit can be both "off" and on" at the same time. A measurment <math>M_Z</math> of the qubit will however return only one of two possible measurement outcomes,''+1'' with the qubit state projected to <math>\left\vert{g}\right\rangle</math>, or ''-1'' with the qubit state projected to <math>\left\vert{e}\right\rangle</math> .

A planar code has four boundaries, two that are called “smooth” and two that are called “rough”. Smooth boundaries have four-term ''Z'' stabilizer generators, and three-term ''X'' stabilizer generators, whereas rough boundaries have four-term ''X'' stabilizer generators and three-term ''Z'' stabilizer generators. A planar code, with two rough and two smooth boundaries can encode a single logical qubit (as in Figure 2). Also look at http://arxiv.org/abs/1208.0928 it is an excellent reference as it represents a comprehensive review of the surface code, also written for the absolute beginner.
<center>[[File:boundaries.jpg]]</center>
<center>'''Figure 2''' Examples of smooth and rough boundaries. This figure has been copied with a permission from the authors of Ref. 2</center>

===Syndrome Extraction and Error Detection===
Detecting errors involves measuring check operators, and observing which ones give a value of ''-1'' (due to anti-commuting with errors). This information helps us guess where errors occurred. In practice, of course, errors do not have to occur on their own, and often one can observe multiple instances next to each other. In these cases, the error operators form error chains throughout the lattice. Since only
the ends of such error chains anti-commute with the check operators, determining where errors occurred often involves guessing the most likely scenario. In the planar case, the chains connect opposite boundaries of the same type (either left to right, or top to bottom), and
in the toric case, chains that span all the way across a given dimension of the lattice. They turn out to the change the encoded, logical
state of the qubit and hence are called logical errors. Two examples are shown Figures 3 and 4. <math>Z_L</math> is a chain of ''Z'' operators that connects two rough boundaries, and <math>X_L</math> chain of ''X'' operators that connects two smooth ones.

<center>[[File:defect.jpg]]</center>
<center>'''Figure 3''' Examples of error syndromes on the Surface code (planar and toric). The state is initialized to the ''+1'' eigenstate of all stabilizers. Shaded qubits indicate locations of ''X'' errors. This figure has been copied with a permission from the authors of Ref. 5</center>

<center>[[File:error.jpg]]</center>
<center>'''Figure 4''' A planar surface code in which a logical ''Z (X)'' error is a chain of ''Z (X)''
operators that spans the whole lattice, and connects rough (smooth) boundaries. This figure has been copied with a permission from the authors of Ref. 5</center>

<center>[[File:error_toric.jpg]]</center>
<center>'''Figure 5''' We can encode two qubits in a toric code, since there are no boundaries. This shows how logical operations are done on a) the first qubit, and b) the second. In both cases, the logical operations involve applying a set of operators (either ''X'' or ''Z'') in a chain that goes around one of the dimensions of the torus. This figure has been copied with a permission from the authors of Ref. 2</center>

The ancilla qubits are used for determining the measurement outcomes of the four-term (and three-term on boundaries) check operators without actually needing to measure them directly. We call these outcomes syndromes, and use them to determine where errors have occurred. A generic circuit capable of determining the sign of a stabilize in Figure 6. The approach consists of initializing the ancilla qubits, performing a collection of CNOT gates with neighboring data qubits, and finally reading out (measuring) the ancillas.

<center>[[File:circuit.jpg]]</center>
<center>'''Figure 6''' a) General circuit determining the sign of a stabilizer S. b) Circuit determining the sign of a stabilizer ''XXXX''. c) Circuit determining the sign of a stabilizer ''ZZZZ''.</center>

The orientation of the CNOT gates is different when dealing with ancilla qubits that are located on the vertices, from the ones that sit on the plaquettes. In the former case, the ancilla qubits play the role of control, while the data qubits are targets. The situation is reversed in the latter scenario. An example of the plaquette readout is shown in Figure 7. The syndrome is now the change in eigenvalues measured between sequential timeslices, just as the syndrome for error-free syndrome extraction likely set of errors that is consistent with the observed syndrome, now in three dimensions: two spatial and one temporal.

<center>[[File:cycle.jpg]]</center>
<center> A syndrome measurement typically involves six steps: ancilla qubit initialization, CNOTs with the four surrounding data qubits fewer on boundaries in a planar code case, and finally ancilla qubit readout. This example shows a temporal order of the CNOT gates of north, west, east, and south. This figure has been copied with a permission from the authors of Ref. 2</center>

===Error Correction===
After each ancilla is read, its value is checked against a result from the previous iteration, and if the values differ, the syndrome change
location (in time and space) is recorded. Next, a matching of all the syndrome changes collected up to this point is used to guess where errors occurred. An example of this is shown in Figure 8, where we can see that in this particular scenario, a collection of ''X'' errors (shown as ''X''s in blue) after six readout cycles, have led to the given space-time locations of syndrome changes (red dots). We stress that one could get the same readout pattern from a different set of errors, hence the best we can do when guessing where the errors occurred is find a guess that is the most likely scenario.

To do this, we observe that shorter error chains are more likely than longer ones and therefore use a minimum-weight matching algorithm to match the syndrome change locations and obtain a likely error pattern. Before the matching algorithm can find a minimum-weight solution, however, we need to convert our matching results into something that the matching algorithm can understand. This is done by converting all the syndrome change results into a graph, with the locations of the syndrome changes representing the graph’s nodes, and edges between these nodes having a weight which depends on the distance between them. The edge weight is measured in faces along the spatial dimensions and ancilla qubit readout cycles along the time dimension.

Finally, the corrected lattice is then passed onto error detection routines which can determine whether a logical error has occurred. If it has, then the simulation is stopped and the previous cycle step (at which the simulation was “frozen”) recorded. If no logical error has been detected, the simulation is reverted to the state just before the “perfect readout” cycle began and continues on.

<center>[[File:graph.jpg]] 
a) An example of syndrome change locations (red dots) after six readout cycles. The ''X'' operators represent the actual errors that the lattice suffered, which lead to the given syndrome change location pattern. These now have to be matched to obtain a guess as to where the errors happened. b) The matching of syndrome changes gives us information on which errors should be corrected. This figure has been copied with a permission from the authors of Ref. 5</center>

===References===
'''1.''' Austin Fowler, Matteo Mariantoni, John Martinis, Andrew Cleland, http://arxiv.org/abs/1208.0928, Submitted on 4 Aug 2012. 
'''2.''' Austin Fowler, Ashley Stephens, and Peter Groszkowski, Phy. Rev. A. '''80''' 052312 (2009). 
'''3.''' David Wang, Austin Fowler, and Lloyd Hollenberg, Phy. Rev. A, '''83''', 020302 (2010). 
'''4.''' Austin Fowler, David Wang, and Lloyd Hollenberg, Quantum Information & Computation '''11''', 8-18 (2011). 
'''5.''' Peter Groszkowski, Master thesis, Waterloo, Ontario, Canada, 2009.

Chapter 13 - Topological Quantum Error Correction

2012-11-19T22:13:26Z

Ddghunter: /* Syndrome Extraction and Error Detection */

Chapter 13 - Topological Quantum Error Correction

2012-11-10T00:23:24Z

Ddghunter: /* Surface Code */

Chapter 13 - Topological Quantum Error Correction

2012-11-10T00:17:47Z

Ddghunter:

File:Graph.jpg

2012-11-10T00:09:53Z

Ddghunter:

File:Error toric.jpg

2012-11-10T00:09:35Z

Ddghunter:

File:Error.jpg

2012-11-10T00:09:19Z

Ddghunter:

File:Cycle.jpg

2012-11-10T00:08:53Z

Ddghunter:

File:Defect.jpg

2012-11-10T00:08:35Z

Ddghunter:

File:Circuit.jpg

2012-11-10T00:08:10Z

Ddghunter:

Chapter 13 - Topological Quantum Error Correction

2012-11-09T05:38:29Z

Ddghunter: /* Introduction */

File:Boundaries.jpg

2012-11-09T05:35:26Z

Ddghunter:

Chapter 13 - Topological Quantum Error Correction

2012-11-09T01:17:18Z

Ddghunter: /* Introduction */

Chapter 13 - Topological Quantum Error Correction

2012-11-09T01:13:46Z

Ddghunter:

Chapter 13 - Topological Quantum Error Correction

2012-11-08T06:43:19Z

Ddghunter:

Chapter 13 - Topological Quantum Error Correction

2012-11-08T06:42:54Z

Ddghunter:

===Introduction===

Surface codes are topological quantum error-correcting codes in which we can think of qubits being arranged on a 2-D lattice of qubits with only nearest neighbor interactions. This in practice may prove to be a very useful feature, since for many systems interacting qubits that are close to each other is substantially less difficult than ones that are further apart. We can think of physical qubits as being arranged on the edges of a lattice as shown in Figure 1. An example of the surface code are toric code and planar code, the main difference between both of them is the boundary condition. In the toric code, the boundaries are periodic whereas in the case of the palnar code, the boundaries are not periodic. In the toric code, the qubits are arranged on a lattice which can be thought of as spread over a surface of a torus, and in a planar code case we think of the data qubits as living on a simple 2-D plane and ancilla qubits on the faces and the intersections.
<center>[[File:lattice.jpg]]</center>
<center>'''Figure 1'''</center><center>A two-dimensional array implementation of the surface code. Data qubits are open circles, measurments (ancilla) qubits are filled circles.</center><center>The yellow area is to measure-Z qubits while the green area is to measure-X qubits.</center>

File:Lattice.jpg

2012-11-08T06:39:05Z

Ddghunter:

Chapter 13 - Topological Quantum Error Correction

2012-11-08T06:15:06Z

Ddghunter: Created page with '===Introduction=== Surface codes are topological quantum error-correcting codes in which we can think of qubits being arranged on a $2-D$ lattice of qubits with only nearest n…'

===Introduction===

Surface codes are topological quantum error-correcting codes in which we can think of qubits being arranged on a $2-D$ lattice of qubits with only nearest neighbor interactions. This in practice may prove to be a very useful feature, since for many systems interacting qubits that are close to each other is substantially less difficult than ones that are further apart. We can think of physical qubits as being arranged on the edges of a lattice as shown in Figure 1. An example of the surface code are toric code and planar code, the main difference between both of them is the boundary condition. In the toric code, the boundaries are periodic whereas in the case of the palnar code, the boundaries are not periodic. In the toric code, the qubits are arranged on a lattice which can be thought of as spread over a surface of a torus, and in a planar code case we think of the data qubits as living on a simple 2-D plane and ancilla qubits on the faces and the intersections.