11-9-95

The Kraft inequality is discussed in : "Generalized Kraft Inequality and Arithmetic Coding" by Jorma Rissanen I will explore some of it's power and meaning here.

Let the alphabet A = { a0,a1,a2... a(N-1) } = An , 0 <= n < (N-1) Therefore |A| (size of A) = N Let the prefix-free code for each member of the alphabet be Cn , i.e. let Cn be the code for An. Let Ln be the length of code Cn. Note that Cn need not be a huffman or shannon-fano code. For example, Cn could be the simple binary code. (i.e. a raw byte). It is clear that not all values of Ln are allowed. For example : Ln = 1 for all n is impossible if N > 2 However, Ln can be as large as we like - there is only a restriction on how small the Ls can be. let us define K = Sum( i = 0 to N ) of 2^(-Li) K is "The Kraft Number" and is a measure of "the size" of L. We see that if L is 1, 2^-L is .5 . We know that we cannot have more than two Ls of .5 If there are more that two Ls of .5 , then K > 1 Similarly, we know L can be as large as we want. Thus, 2^(-L) can be as small as we want, so K can be as small as we want. Thus we can intuitively see that there must be a strict upper bound on K, and no lower bound. It turns out that a prefix-code only exists for the codes IF AND ONLY IF: K <= 1 This is the Kraft inequality. Proof of this statement is beyond the scope of this discussion. (though it is pretty simple entropy juggling) Note, though, that Li = -log2(Pi) + Ei where Ei is the "extra length for a code" - i.e. the length greater than the entropy. thus 2^(-Li) = 2^(--log2(Pi) - Ei) = 2^(log2(Pi)) * 2^(-Ei) = Pi / 2^(Ei) Thus K = Sum( i = 0 to N ) of Pi * 2^(- Ei) Imagine the simple case where Ei = E for all i Thus K = Sum( i = 0 to N ) of Pi / 2^(E) = [ Sum( i = 0 to N ) of Pi ] / 2^E but [ Sum( i = 0 to N ) of Pi ] = 1 Thus K = 1 / 2^E So the Kraft inequality says 1/ 2^E <= 1 1 = 2^0 2^(-E) <= 2^(0) -E <= 0 E >= 0 The result is : The Kraft inequality says that a code is decodeable IF AND ONLY IF the code lengths are greater than or equal to that determined by the entropy. This should make good sense.

Let us now look at some of the power of the Kraft inequality. let Ln = log2(N) for all n i.e. each code is a simple binary code. For example, if N = 256, then each Cn is just a byte, and Ln = 8 for all N K = Sum(i = 0 to 256) 2^(-Li) = Sum 2^(-log2(N)) = [ Sum 1 ] / 2^(log2(N)) = N / N = 1 Thus we see this is a valid code. Now let L0 = log2(N) - D i.e. symbol L0 is compressed by D bits. let Li = log2(N) for 2 <= i < N Let us find the minimum of L1. K = Sum(i = 0 to N)[ 2^(-Li) ] = Sum(i = 0 to N)[ 2^(-log2(N)) ] + 2^(-L0) + 2^(-L1) - 2^(-log2(N)) - 2^(-log2(N)) = 1 + 2^(-L0) + 2^(-L1) - 2^(-log2(N)) - 2^(-log2(N)) K <= 1 2^(-L0) + 2^(-L1) - 2^(-log2(N)) - 2^(-log2(N)) <= 0 2^(-L1) <= 2 * 2^(-log2(N)) - 2^(-L0) 2 * 2^(-log2(N)) = 2 / 2^(log2(N)) = 2 / N 2^(-L0) = 2^(D - log2(N)) = 2^D / 2^(log2(N)) = 2^D / N 2^(-L1) <= 2 / N - 2^D / N N / 2^(L1) <= 2 - 2^D let L1 = log2(N) + E 2^(L1) = N * 2^E N / 2^(L1) = 1 / 2^E 2^(-E) <= 2^1 - 2^D E >= - log2( 2 * ( 1 - 2^(D-1) ) ) E >= - 1 - log2( 1 - 2^(D-1) ) we see two things : 1. D MUST be <= 1 , otherwise the right side is negative, and E doesn't exist. Thus we see that if only 2 code lengths change and all others stay the same, then D can only change by 1. 2. let D = 1 ( the biggest change ) then E >= INFINITY !!! let D = 0 ( no change ) then E >= 0 ( no change ) let D = .5 then E >= .77155 In general E = c(D) * D where c(D) is > 1 for all D This proves a very valuable and interesting statement : If you have a set of N possible data strings, and you wish to compress some of them by a factor F, then you MUST expand the others by a factor GREATER than F!!! You can see this is practice by generating some Huffman codes and looking at the lengths. This also proves that if you have an equal number of occurances of each symbol, then compressing any one symbol will result in a net expansion - i.e. the best code for a flat input is a flat output.

Charles Bloom / cb at my domain