Skip to content

Communication, Codes and Cyphers

Information Theory

Entropy: Measuring Uncertainty

We want to come up with some means of quantifying uncertainty. Before we can do this, we should try and work out what properties uncertainty should have. Hopefully the following propositions make some sort of heuristic sense.

  1. The outcome of spinning a roulette wheel is more uncertain than the outcome of tossing a coin.
  2. The outcome of a running race between two evenly matched competitors is more uncertain than a race between an Olympic runner and an average person.
  3. The outcome of two spins of a roulette wheel is twice as uncertain as one spin.
  4. Changing the likelihoods of the outcomes a little bit shouldn't change the uncertainty very much.

We'll eventually come up with a mathematical quantity which satisfies these three conditions, plus a number of others. To justify it, however, we will first go through a heuristic argument.

An interesting, if somewhat unusual, way to consider uncertainty, is that it is the average amount of surprise that you have upon learning of the result. So what do we mean by surprise? Surprise is clearly related to the probability that an event happens. Having a head come up when tossing a coin is less surprising than having the number you chose on a roulette wheel winning, since the first is more probable than the second. And if your number came up on two consecutive spins of a roulette wheel, you'd be twice as surprised.

Mathematically, then surprise is a function S(p) of the probability p of an event. It should also be additive in the sense that if an event with probability p happens and, independently, an event with probability q happens, then your total surprise whould be S(p) + S(q), the sum of the two surprises. Since the probability of these two independent events both happening is pq, we have:

S(pq) = S(p) + S(q)

Also, if something is certain (ie. it has probability 1), then there is no surprise, so

S(1) = 0

Mathematically, we can show that a function S with these sort of properties must be related to a logarithm. This is reasonable, since logarithms have the same sort of properties:

logb pq = logb p + logb q

It turns out that the best measure of surprise is:

S(p) = - log2 p

Probabilities are always numbers from 0 to 1, and the logarithms of these numbers will be negative. Since we'd really like surprise to be a positive quantity, we want to take the negative of the logarithm to get a positive quantity. We choose a base 2 for the logarithm because it is the most convenient for calculation.

Example

The surprise at seeing a coin come up heads when tossed is:

S(0.5) = - log2 0.5 = - (-1) = 1
Example

The surprise of picking the outcome of the spin of a roulette wheel is:

S(1/32) = - log2 1/32 = - (-5) = 5

If uncertainty is the average amount of surprise, then if the possible outcomes have probabilities p1, p2, ..., pn, then the average surprise is:

p1 S(p1) + p2 S(p2) + ... + pn S(pn)

or

-p1 log2 p1 - p2 log2 p2 - ... - pn log2 pn.

We call this quantity entropy, and regard it as a good measure of uncertainty, and hence information.

Example

The entropy in the spin of a roulette wheel is:

32 (-1/32 log2 1/32) = 5
Example

The entropy in a coin toss is:

-0.5 log2 0.5 - 0.5 log2 0.5 = 1
Example

You might expect that an average person would beat an Olympic runner once in every million races. The entropy of such a race would then be:

-0.000001 log2 0.000001 - 0.99999 log2 0.99999 = 0.00002137
[ Valid XHTML 1.0! ]