Compression : News Postings : Kraft Inequality

11-9-95



The Kraft inequality is discussed in :

	"Generalized Kraft Inequality and Arithmetic Coding"
		by Jorma Rissanen

I will explore some of it's power and meaning here.



Let the alphabet A = { a0,a1,a2... a(N-1) } = An , 0 <= n < (N-1)
Therefore |A| (size of A) = N

Let the prefix-free code for each member of the alphabet be Cn ,
i.e. let Cn be the code for An.

Let Ln be the length of code Cn.

Note that Cn need not be a huffman or shannon-fano code.
For example, Cn could be the simple binary code. (i.e. a raw byte).

It is clear that not all values of Ln are allowed.

For example : Ln = 1 for all n is impossible if N > 2

However, Ln can be as large as we like - there is only a restriction
on how small the Ls can be.

let us define

	K = Sum( i = 0 to N ) of 2^(-Li)

K is "The Kraft Number" and is a measure of "the size" of L.  We see that
if L is 1, 2^-L is .5 .  We know that we cannot have more than two Ls
of .5
If there are more that two Ls of .5 , then K > 1
Similarly, we know L can be as large as we want.  Thus, 2^(-L) can be
as small as we want, so K can be as small as we want.

Thus we can intuitively see that there must be a strict upper bound on K,
and no lower bound.

It turns out that a prefix-code only exists for the codes IF AND ONLY IF:

	K <= 1

This is the Kraft inequality.  Proof of this statement is
beyond the scope of this discussion. (though it is pretty simple
entropy juggling)

Note, though, that Li = -log2(Pi) + Ei

where Ei is the "extra length for a code" - i.e. the length greater
than the entropy.

thus 2^(-Li) = 2^(--log2(Pi) - Ei) = 2^(log2(Pi)) * 2^(-Ei)
 = Pi / 2^(Ei)

Thus K = Sum( i = 0 to N ) of Pi * 2^(- Ei)

Imagine the simple case where Ei = E for all i

Thus K = Sum( i = 0 to N ) of Pi / 2^(E) = [ Sum( i = 0 to N ) of Pi ] / 2^E

but [ Sum( i = 0 to N ) of Pi ] = 1

Thus K = 1 / 2^E

So the Kraft inequality says 1/ 2^E <= 1

1 = 2^0

2^(-E) <= 2^(0)

-E <= 0

E >= 0

The result is : The Kraft inequality says that a code is decodeable
IF AND ONLY IF the code lengths are greater than or equal to that
determined by the entropy.

This should make good sense.



Let us now look at some of the power of the Kraft inequality.

let Ln = log2(N) for all n

i.e. each code is a simple binary code.  For example, if N = 256,
then each Cn is just a byte, and Ln = 8 for all N

K = Sum(i = 0 to 256) 2^(-Li) = Sum 2^(-log2(N)) = [ Sum 1 ] / 2^(log2(N))

	= N / N = 1

Thus we see this is a valid code.

Now let L0 = log2(N) - D

i.e. symbol L0 is compressed by D bits.

let Li = log2(N) for 2 <= i < N

Let us find the minimum of L1.

K = Sum(i = 0 to N)[ 2^(-Li) ]

  = Sum(i = 0 to N)[ 2^(-log2(N)) ] + 2^(-L0) + 2^(-L1) - 2^(-log2(N)) - 2^(-log2(N))
 
  = 1 + 2^(-L0) + 2^(-L1) - 2^(-log2(N)) - 2^(-log2(N))

K <= 1

2^(-L0) + 2^(-L1) - 2^(-log2(N)) - 2^(-log2(N)) <= 0

2^(-L1) <= 2 * 2^(-log2(N)) - 2^(-L0)

	2 * 2^(-log2(N)) = 2 / 2^(log2(N)) = 2 / N

	2^(-L0) = 2^(D - log2(N)) = 2^D / 2^(log2(N)) = 2^D / N

2^(-L1) <= 2 / N - 2^D / N

N / 2^(L1) <= 2 - 2^D

let L1 = log2(N) + E

2^(L1) = N * 2^E

N / 2^(L1) = 1 / 2^E

2^(-E) <= 2^1 - 2^D

E >= - log2( 2 * ( 1 - 2^(D-1) ) )

E >= - 1 - log2( 1 - 2^(D-1) )

we see two things :

1. D MUST be <= 1 , otherwise the right side is negative, and E doesn't 
		exist.  Thus we see that if only 2 code lengths change and all others
		stay the same, then D can only change by 1.

2. let D = 1 ( the biggest change )
		then E >= INFINITY !!!

	 let D = 0 ( no change )
		then E >= 0 ( no change )

	 let D = .5
		then E >= .77155

In general E = c(D) * D
 where c(D) is > 1 for all D

This proves a very valuable and interesting statement :

If you have a set of N possible data strings, and you
wish to compress some of them by a factor F, then you
MUST expand the others by a factor GREATER than F!!!

You can see this is practice by generating some Huffman codes
and looking at the lengths.

This also proves that if you have an equal number of
occurances of each symbol, then compressing any one
symbol will result in a net expansion - i.e. the best
code for a flat input is a flat output.


Back to the News Index

Charles Bloom / cb at my domain