Skip to content

Communication, Codes and Cyphers

Codes

2.3 The Kraft Inequality

We want to know when it is possible to make an instantaneously decipherable code. The following result gives us some information:

Theorem (The Kraft Inequality)

Assume that we have a source alphabet of n letters and a code alphabet with m letters. Then we there is an instantaneous code whose codewords have length l1,...,ln if and only if

m-l1 + m-l2 + ... + m-ln ≤ 1

Since we work most commonly with code alphabets with 2 letters, we most commonly use this in the form:

2-l1 + 2-l2 + ... + 2-ln ≤ 1

The basic idea behind the inequality follows from the fact that no codeword of an instantaneously decipherable code can start any other codeword. This means, for example, that if you have a codeword of length 1, say "0", then no other codewords can start with a "0". But half of all possible codewords start with a "0", so by having a codeword of length 1, you "use up" one half, or 2-1, of all possible codewords.

Similarly, a codeword of length 2 will use up a quarter (2-2) of all possible codewords; a codeword of length 3 will use up an eighth (2-3) of all possible codewords. This is where the terms of the form 2-l come from.

Finally we need to realise that we can't use up more codewords than there are, so the sum of the proportions that we have used up must be less than or equal to 1.

Example

Consider the code with source alphabet {A,B,C,D,E}, code alphabet {0,1} and codewords:

SourceABCDE
Code01001110101111

We saw in previous examples that this code is instantaneously decipherable, and it is easy to check that it satisfies the Kraft ineqality. The lengths of the codewords are 1, 4, 3, 3 and 3, and

2-1 + 2-4 + 2-3 + 2-3 + 2-3 = 0.9375

which is less than or equal to 1.

The Kraft ineqality works in the other direction as well. As long as the word lengths satisfy the ineqality, there is some instantaneously decipherable code (in fact, there are many) which has those lengths.

Example

Construct an instantaneously decipherable code from the source alphabet {A, B, C, D, E} to the code alphabet {0, 1} so that the A has codeword length 3, B has codeword length 2, C has codeword length 3, D has codeword length 3, and E has codeword length 2.

We first check that:

2-3 + 2-2 + 2-3 + 2-3 + 2-2 = 0.875

which is less than one, so according to the Kraft inequality, there is a code with this property.

To find such a code, we notice that the shortest codeword length is 2, with both B and E having that length. So we choose any two codewords of length 2 for B and E, say 00 and 01 respectively.

The next shortest codeword length is 3, which is the length of each of the remaining codewords. We choose codewords of length 3 for A, C and D, making sure that they do not start with the codewords for B and E. We might choose 100 for A, 101 for C, and 110 for D, for example.

So we end up with the following code:

SourceABCDE
Code1000010111001

The way that the above example was solved works in general, and so you should be able to adapt it to any situation where the Kraft inequality holds.

2.3.1 Proof of the Kraft Inequality

For completeness, and to keep the author honest, here is a formal mathematical proof of the Kraft inequality. For simplicity, we will prove this for m = 2, since that will be the most common case in practice, and it makes the proof much clearer.

Proof:

Firstly, assume that we have an instantaneous code with the required codeword lengths. We draw the decision tree for this code, and let L be the height of the tree (that is the longest distance from the root to a vertex).

Firstly, if the height is 1, then we have 2 possibilities: the tree is either

or
In the first case, we have one codeword, whose length is one, so we get that 2-1 = 1/2 < 1. In the second case, we have two codewords, each of length 1, so 2-1 + 2-1 = 1.

Now, assume we can show that the inequality holds for trees of some height L = n. Then if we could show that this means that the inequality must be true for trees of height L = n + 1, we would be done, since we know it is true for height 1, which means it is true for height 1 + 1 = 2, which means it is true for height 2 + 1 = 3, and so on...

So if we are given a decision tree of height n + 1, then we can break it up into two sub-trees T1 and T2 as shown in the following diagram

But then the two trees T1 and T2 are decision trees of height n, and so if we know that the Kraft inequality holds for these, then if l1,...,lk are the lengths of the codewords in the first sub-tree, and if lk+1,...,ln are the lengths of the codewords in the second sub-tree, then the inequalities
2-(l1 - 1) + ... + 2-(lk -1 ) ≤ 1
and
2-(lk + 1 - 1) + ... + 2-(ln - 1) ≤ 1
hold. Then a little arithmetic shows us that
2-l1 + ... + 2-ln = 2-1(2-(l1 - 1) + ... + 2-(lk - 1)) + 2-1(2-(lk + 1 - 1) + ... + 2-(ln - 1)) ≤ 2-1 + 2-1 = 1

So now we need to consider the opposite situation: given a collection of lengths l1,...,ln which satisfy the Kraft inequality, can we construct a code? The way we will show this is by giving a recipe for building the code.

There must be some smallest length, call it j, and it may happen that there is more than one codeword of this length, so we will assume that there are N(j) of them. Now draw a full tree of height j and make N(j) of the branches codewords. Then we choose the next smallest length, and extend the remaining branches out to that length, choose codewords, and so on.

So how do we know that this must alays work? Well, we can re-write the Kraft inequality as

N(1) 2-1 + N(2) 2-2 + ... + N(L) 2-L ≤ 1
(where N(k) may be 0 for some k). So let us assume that we have drawn the tree up to level j, and that k is the next highest value such that N(k) is not zero. Then we must have
N(k) ≤ 2k - (N(1) 2k - 1 + ... + N(j) 2k - j)
and if we multiply both sides by 2-k and re-arrange, we get
N(1) 2-1 + N(2) 2-2 + ... + N(k) 2-k ≤ 1
which is true from our re-written form of the Kraft inequality.

[ Valid XHTML 1.0! ]