2.3 The Kraft Inequality
We want to know when it is possible to make an instantaneously decipherable
code. The following result gives us some information:
Theorem (The Kraft Inequality)
Assume that we have a source alphabet of n letters and a code
alphabet with m letters. Then we there is an instantaneous code
whose codewords have length l1,...,ln
if and only if
m-l1 + m-l2 + ...
+ m-ln ≤ 1
Since we work most commonly with code alphabets with 2 letters, we most
commonly use this in the form:
2-l1 + 2-l2 + ...
+ 2-ln ≤ 1
The basic idea behind the inequality follows from the fact that no codeword
of an instantaneously decipherable code can start any other codeword. This
means, for example, that if you have a codeword of length 1, say "0", then
no other codewords can start with a "0". But half of all possible codewords
start with a "0", so by having a codeword of length 1, you "use up" one half,
or 2-1, of all possible codewords.
Similarly, a codeword of length 2 will use up a quarter (2-2) of
all possible codewords; a codeword of length 3 will use up an eighth
(2-3) of all possible codewords. This is where the terms of the
form 2-l come from.
Finally we need to realise that we can't use up more codewords than there
are, so the sum of the proportions that we have used up must be less than
or equal to 1.
Example
Consider the code with source alphabet {A,B,C,D,E}, code alphabet {0,1} and
codewords:
| Source | A | B | C | D | E |
| Code | 0 | 1001 | 110 | 101 | 111 |
We saw in previous examples that this code is instantaneously decipherable,
and it is easy to check that it satisfies the Kraft ineqality. The lengths
of the codewords are 1, 4, 3, 3 and 3, and
2-1 + 2-4 + 2-3 + 2-3 +
2-3 = 0.9375
which is less than or equal to 1.
The Kraft ineqality works in the other direction as well. As long as the
word lengths satisfy the ineqality, there is some instantaneously
decipherable code (in fact, there are many) which has those lengths.
Example
Construct an instantaneously decipherable code from the source
alphabet {A, B, C, D, E} to the code alphabet {0, 1} so that the
A has codeword length 3, B has codeword length 2, C has codeword
length 3, D has codeword length 3, and E has codeword length 2.
We first check that:
2-3 + 2-2 + 2-3 + 2-3 +
2-2 = 0.875
which is less than one, so according to the Kraft inequality, there
is a code with this property.
To find such a code, we notice that the shortest codeword length is
2, with both B and E having that length. So we choose any two codewords
of length 2 for B and E, say 00 and 01 respectively.
The next shortest codeword length is 3, which is the length of each of the
remaining codewords. We choose codewords of length 3 for A, C and D,
making sure that they do not start with the codewords for B and E. We
might choose 100 for A, 101 for C, and 110 for D, for example.
So we end up with the following code:
| Source | A | B | C | D | E |
| Code | 100 | 00 | 101 | 110 | 01 |
The way that the above example was solved works in general, and so you
should be able to adapt it to any situation where the Kraft inequality
holds.
For completeness, and to keep the author honest, here is a formal
mathematical proof of the Kraft inequality. For simplicity, we will
prove this for m = 2, since that will be the most common case
in practice, and it makes the proof much clearer.
Proof:
Firstly, assume that we have an instantaneous code with the required
codeword lengths. We draw the decision tree for this code, and let
L be the height of the tree (that is the longest distance from the
root to a vertex).
Firstly, if the height is 1, then we have 2 possibilities:
the tree is either
or
In the first case, we have one codeword, whose length is one, so we get
that 2-1 = 1/2 < 1. In the second case, we have two
codewords, each of length 1, so 2-1 + 2-1 = 1.
Now, assume we can show that the inequality holds for trees of some height
L = n. Then if we could show that this means that the
inequality must be true for trees of height L = n + 1, we
would be done, since we know it is true for height 1, which means it is
true for height 1 + 1 = 2, which means it is true for height 2 + 1 = 3, and so
on...
So if we are given a decision tree of height n + 1, then we can
break it up into two sub-trees T1 and T2
as shown in the following diagram
But then the two trees T1 and T2 are
decision trees of height n, and so if we know that the Kraft
inequality holds for these, then if
l1,...,lk are the lengths of the
codewords in the first sub-tree, and if
lk+1,...,ln are the lengths of the
codewords in the second sub-tree, then the inequalities
2-(l1 - 1) + ... + 2-(lk
-1 ) ≤ 1
and
2-(lk + 1 - 1) + ... + 2-(ln
- 1)
≤ 1
hold. Then a little arithmetic shows us that
2-l1 + ... + 2-ln =
2-1(2-(l1 - 1) + ... +
2-(lk - 1)) +
2-1(2-(lk + 1 - 1) + ... +
2-(ln - 1)) ≤ 2-1 + 2-1 = 1
So now we need to consider the opposite situation: given a collection of
lengths l1,...,ln which satisfy the
Kraft inequality, can we construct a code? The way we will show this is by
giving a recipe for building the code.
There must be some smallest length, call it j, and it may happen
that there is more than one codeword of this length, so we will assume
that there are N(j) of them. Now draw a full tree of height
j and make N(j) of the branches codewords. Then we choose
the next smallest length, and extend the remaining branches out to that
length, choose codewords, and so on.
So how do we know that this must alays work? Well, we can re-write the
Kraft inequality as
N(1) 2-1 + N(2) 2-2 + ... + N(L) 2-L
≤ 1
(where N(k) may be 0 for some k). So let us assume that we
have drawn the tree up to level j, and that k is the next
highest value such that N(k) is not zero. Then we must have
N(k) ≤ 2k - (N(1) 2k - 1 + ... + N(j)
2k - j)
and if we multiply both sides by 2-k and re-arrange, we
get
N(1) 2-1 + N(2) 2-2 + ... + N(k) 2-k
≤ 1
which is true from our re-written form of the Kraft inequality.
|