# Compression : News Postings : Blocked Entropy

Path: geraldo.cc.utexas.edu!cs.utexas.edu!uwm.edu!vixen.cso.uiuc.edu!news.uoregon.edu!waikato!auckland.ac.nz!news
From: cb at my domain (Charles Bloom)
Newsgroups: comp.compression.research
Subject: Re: Q: continuous version of entropy?
Date: 15 Oct 1995 12:26:36 GMT
Organization: The Universe
Lines: 54
Sender: compression@cs.aukuni.ac.nz (comp.compression.research moderator)
Approved: compression@cs.aukuni.ac.nz
Message-ID: <45quls$oib@net.auckland.ac.nz> References: <45gbor$8dn@net.auckland.ac.nz> <45o3jv$ffl@net.auckland.ac.nz> Reply-To: cb at my domain (Charles Bloom) NNTP-Posting-Host: cs26.cs.auckland.ac.nz X-Newsreader: NN version 6.5.0 #3 (NOV) Hi, >This is a quite interesting idea on which I personally have some thoughts >too. 'Let me share them with you on the net. > >I think Shannon is right ;) BUT there's a little flaw in the theory. This >is connected with the "continuous"remark from above. I spent some 2 years >now investing several coding techniques and one thing hits me every time >again. If you code a certain message it's *always* possible to do it >better. Futhermore the entropy of a message seems not a real "hard" measure >of information. Let me explain: This is not true. See below. >If you take a simple dataset and calculate the entropy. You might think you >get a "hard" real number of the information in the dataset. BUT..... If you >take a simple operation like taking the deltamodulated set of the dataset >(only the differences are taken in account now) the entropy (tadaa) WILL >change. Mind boggeling (is this english?) isn't it? > >Well any guru's out there who is able to explain this? Actually, this is not mind-boggling at all, and can be very easily explained. The Entropy (as in Shannon's Entropy) is actually a measure of the entropy of a data set WITH RESPECT TO A CERTAIN MODEL. The model usually implied is an order-0 character frequency model. You use the example that sometimes the delta-model gives a relative lower entropy. i.e. H(vs. delta) < H(order-0) that's fine. However : for a finite length data set, there are a finite # of possible non-self-similar models which differ in ways other than constant coefficients. Call these Model0,Model1,Model2...ModelN Now let the entropy vs. each model be: H(vs. ModelI) = HI It is intuitive & simple mathematical analysis to show that whenever you have a finite set of numbers, there must 1 which none is smaller than. Therefore, there exists a J such that HJ <= HK for all K, 0 <= K <= N Get it? Charles Path: geraldo.cc.utexas.edu!cs.utexas.edu!howland.reston.ans.net!swrinde!elroy.jpl.nasa.gov!lll-winken.llnl.gov!enews.sgi.com!decwrl!waikato!auckland.ac.n From: Rob Reeves Newsgroups: comp.compression.research Subject: Re: Q: continuous version of entropy? Date: 18 Oct 1995 02:06:17 GMT Organization: Queensland University of Technology Lines: 61 Sender: compression@cs.aukuni.ac.nz (comp.compression.research moderator) Approved: compression@cs.aukuni.ac.nz Message-ID: <461nep$680@net.auckland.ac.nz>
References: <45gbor$8dn@net.auckland.ac.nz> <45o3jv$ffl@net.auckland.ac.nz>
NNTP-Posting-Host: cs26.cs.auckland.ac.nz
X-Newsreader: NN version 6.5.0 #3 (NOV)

vdkamp@rullf2.MedFac.LeidenUniv.NL (B. van der Kamp) wrote:
>In article <45gbor$8dn@net.auckland.ac.nz>, amyc@well.sf.ca.us (Amy Caplan) > writes: > >>i realized that when one applies the Shannon entropy -\sum p(k) log(p(k)) >>to a discretized distribution, the value depends on the resolution of >>the discretization. This makes sense, though it surprised me at first. >> >>Recently I saw defined a *continuous* equivalent of this entropy, i.e., >>sum replaced by integral. >> >>My question: how is this defined. For example, suppose p(x) is constant >>on the interval 0,1 (so it is p(x)=1 in 0..1, 0 elsewhere). >>I expect this p() should be a maximum entropy distribution, but it >>seems like the entropy is zero? (is it? also, what calculus technique >>does one use to evaluate the 0*inf value the integrand takes on >>outside th?) > >This is a quite interesting idea on which I personally have some thoughts >too. 'Let me share them with you on the net. > >I think Shannon is right ;) BUT there's a little flaw in the theory. This >is connected with the "continuous"remark from above. I spent some 2 years >now investing several coding techniques and one thing hits me every time >again. If you code a certain message it's *always* possible to do it >better. Futhermore the entropy of a message seems not a real "hard" measure >of information. Let me explain: > >If you take a simple dataset and calculate the entropy. You might think you >get a "hard" real number of the information in the dataset. BUT..... If you >take a simple operation like taking the deltamodulated set of the dataset >(only the differences are taken in account now) the entropy (tadaa) WILL >change. Mind boggeling (is this english?) isn't it? > >Well any guru's out there who is able to explain this? Not a guru by any stretch of the imagination! But I believe the clue is that the entropy depends on the number of symbols grouped tpgether. For instance, in a 5 symbol alphabet, if you group symbols in pairs, you have a new 25 symbol alphabet. The probabilities for the new 25 symbols will naturally all be smaller than the probabilities for the original 5 symbols. We can repeat the grouping process as much as we like, at each step getting an alphabet with 5^n new symbols, where n is the size of the groups. As the number of new symbols increases, the probability of each symbol approaches 0, and the terms plogp in the entropy expression approach zero. Does this mean the entropy approaches zero as the group size increases without bound? I think it does, but someone more knowledgable will have to answer that! The implication is that the entropy can be made as low as we like, by grouping symbols together. However since the number of symbols is related to the power of the group size, practical limits are reached fairly quickly. -Rob Reeves rreeves@qut.edu.au Path: geraldo.cc.utexas.edu!cs.utexas.edu!news.sprintlink.net!in2.uu.net!comp.vuw.ac.nz!auckland.ac.nz!news From: cb at my domain (Charles Bloom) Newsgroups: comp.compression.research Subject: Re: Q: continuous version of entropy? Date: 21 Oct 1995 13:03:35 GMT Organization: The Universe Lines: 59 Sender: compression@cs.aukuni.ac.nz (comp.compression.research moderator) Approved: compression@cs.aukuni.ac.nz Message-ID: <46ar37$lte@net.auckland.ac.nz>
References: <45gbor$8dn@net.auckland.ac.nz> <45o3jv$ffl@net.auckland.ac.nz> <461nep$680@net.auckland.ac.nz> Reply-To: cb at my domain (Charles Bloom) NNTP-Posting-Host: cs26.cs.auckland.ac.nz X-Newsreader: NN version 6.5.0 #3 (NOV) >Not a guru by any stretch of the imagination! But I believe the clue is that the >entropy depends on the number of symbols grouped tpgether. For instance, >in a 5 symbol alphabet, if you group symbols in pairs, you have a new >25 symbol alphabet. The probabilities for the new 25 symbols will naturally >all be smaller than the probabilities for the original 5 symbols. We can >repeat the grouping process as much as we like, at each step getting an >alphabet with 5^n new symbols, where n is the size of the groups. As the >number of new symbols increases, the probability of each symbol >approaches 0, and the terms plogp in the entropy expression approach zero. > >Does this mean the entropy approaches zero as the group size increases >without bound? I think it does, but someone more knowledgable will >have to answer that! > >The implication is that the entropy can be made as low as we like, by >grouping symbols together. However since the number of symbols is related >to the power of the group size, practical limits are reached fairly quickly. > >-Rob Reeves >rreeves@qut.edu.au Let's take an example of a 2 symbol alphabet, expanded to a 4-symbol alphabet. The input set is symbols A and B, occuring with pA and pB pB = 1 - pA Entropy is then: H0 = ( -pA log pA ) + ( -pB log pB) Now let's group them together into : C = AA D = AB E = BA F = BB pC = pA * pA pD = pA * pB = pA - pA * pA pE = pD = pA - pA * pA pF = pB * pB = 1 - 2 * pA + pA * pA H1 = - [ ( pC log pC ) + ( pD log pD ) + ( pE log pE ) + ( pF log pF ) ] log pC = 2 log pA , etc. H1 = - [ ( 2 * pA * pA * log pA ) + 2 * ( pA * pB * log pA + pA * pB * log pB ) + 2 * ( pB * pB * log pB ) ] H1 = - 2 [ ( pA * pA + pA * pB ) * log pA + ( pB * pB + pA * pB ) * log pB ] = - 2 [ ( pA ( pA + pB ) ) log pA + ( pB * ( pB + pA ) ) * log pB ] = 2 * [ - pA log pA - pB log pB ] = 2 * H0 when you double the block size, you double the entropy. i.e. if H0 = entropy of binary alphabet A,B and H1 = entropy of two-bit-blocks of As and Bs then H1 = 2 * H0 I want to emphasize here that half as many events are coded, so the net output is the same. This will be true for any blocking IF AND ONLY IF the probability of blocked codes is dependent only on the probability of the base codes. For example, this is not true in normal cases. The entropy of the 8-bit blocks is less than the 8 times the entropy of 1-bit blocks. This is because blocking codes is actually a form of implicit modelling (see my post about the dependence of entropy on the model). Thus 8-bit blocking is an implicit model which says that the bits in a byte are dependent on eachother (which we all know is true). For example, byte-wise DMC tends to get higher compression than the original bit-wise DMC. Similarly, the byte-wise PPM beats the bit-wise "Context" of Rissanen. Futhermore, in practice blocks much larger than the "natural block size" of the data tend to result in lower compression, because the statistics become sparse and hard to model. (i.e. two-byte (machine short word) modelling is not effective) Charles Bloom Path: geraldo.cc.utexas.edu!cs.utexas.edu!academ!news.sesqui.net!news-out.internetmci.com!newsfeed.internetmci.com!in1.uu.net!comp.vuw.ac.nz!waikato!auckl From: Martin Tom Brown Newsgroups: comp.compression.research Subject: Re: Q: continuous version of entropy? Date: 2 Nov 1995 03:23:53 GMT Organization: Nezumi Lines: 50 Sender: compression@cs.aukuni.ac.nz (comp.compression.research moderator) Approved: compression@cs.aukuni.ac.nz Message-ID: <479dk9$ms2@net.auckland.ac.nz>
References: <45gbor$8dn@net.auckland.ac.nz> <45o3jv$ffl@net.auckland.ac.nz> <461nep$680@net.auckland.ac.nz> <46ar37$lte@net.auckland.ac.nz> <46nb11$bs1@ne Reply-To: Martin Tom Brown NNTP-Posting-Host: cs26.cs.auckland.ac.nz X-Newsreader: NN version 6.5.0 #3 (NOV) In article <46nb11$bs1@net.auckland.ac.nz>
bmt@ix.netcom.com "Bruce Tannenbaum " writes:

> In <46ar37$lte@net.auckland.ac.nz> cb at my domain (Charles > Bloom) writes: > > >Let's take an example of a 2 symbol alphabet, expanded to a 4-symbol alphabet. > > > >The input set is symbols A and B, occuring with pA and pB > > pB = 1 - pA > > > >Entropy is then: > > > >H0 = ( -pA log pA ) + ( -pB log pB) > > > >Now let's group them together into : > > C = AA > > D = AB > > E = BA > > F = BB > ... a bunch of equations...... [snip] - Entropy per symbol remains the same > I have a question. Assume pA >> pB and that pC (=pAA) >> pD or pE or > pF. What happens if you decide on the following type of alphabet? > > A, B, C (=AA) The alphabet you would use in this case pA >> pB would be B, AA, AB - since this will encode your data in least symbols. H = -pB' log pB' - pAA log pAA - pAB log pAB Where pB' is prob of B following another B or an even multiple of A's pB' = pB = p, pAA =(1-p)^2, pAB = p(1-p) if A's and B's are uncorrelated. The number of symbols used is N*( pB' + (pAA + pAB)/2 ) The entropy (assuming no correlation) is quick but tedious back of envelope algebra leading to H' = (1 - p/2)*H0 - I think :-) Regards, -- Martin Brown __ CIS: 71651,470 Scientific Software Consultancy /^,,)__/ Path: geraldo.cc.utexas.edu!cs.utexas.edu!news.sprintlink.net!news.uoregon.edu!waikato!auckland.ac.nz!news From: spateux@irisa.fr (Stephane Pateux) Newsgroups: comp.compression.research Subject: Re: Q: continuous version of entropy? Date: 4 Nov 1995 07:04:21 GMT Organization: IRISA, Campus de Beaulieu, 35042 Rennes Cedex, FRANCE Lines: 46 Sender: compression@cs.aukuni.ac.nz (comp.compression.research moderator) Approved: compression@cs.aukuni.ac.nz Message-ID: <47f39l$77t@net.auckland.ac.nz>
References: <479dk9$ms2@net.auckland.ac.nz> Reply-To: spateux@irisa.fr (Stephane Pateux) NNTP-Posting-Host: cs26.cs.auckland.ac.nz X-Newsreader: NN version 6.5.0 #3 (NOV) In article <479dk9$ms2@net.auckland.ac.nz>, Martin Tom Brown  writes:
> > I have a question.  Assume pA >> pB and that pC (=pAA) >> pD or pE or
> > pF.  What happens if you decide on the following type of alphabet?
> >
> >     A, B, C (=AA)
>
> The alphabet you would use in this case pA >> pB would be
>
>         B, AA, AB  - since this will encode your data in least symbols.
>
>         H = -pB' log pB' - pAA log pAA - pAB log pAB
>
> Where pB' is prob of B following another B or an even multiple of A's
> pB' = pB = p, pAA =(1-p)^2, pAB = p(1-p) if A's and B's are uncorrelated.
> The number of symbols used is   N*( pB' + (pAA + pAB)/2 )
> The entropy (assuming no correlation) is quick but tedious back of
> envelope algebra leading to    H' = (1 - p/2)*H0  - I think :-)
>
> Regards,
> --
> Martin Brown       __                CIS: 71651,470
> Scientific Software Consultancy             /^,,)__/
>

I don't agree with the formula of Martin Brown. When we take the alphabet B, AA and AB,
the entropy linked to this alphabet is defined by:
H = -pB' log pB' - pAA log pAA - pAB log pAB, which is equal to (with Martin's notation):
H = (2-p) { -p log p - (1-p) log (1-p)} = (2-p) H0.
To be compared to the initial entropy, we should divide by the mean length of the new
symbol: l = pB' * L(B') + pAA * L(AA) + pAB * L(AB). We easily find that l = (2-p).

Thus the entropy per symbol is still the same!....

In my point of vue, as soon as we make the assumption that the symbols are not
correlated, we will allways find the same entropy per symbol. Grouping symbols together
helps to decorrelate the symbols, but when they are not, it doesn't change anything except
the complexity.

Regards,
------------
Pateux Stephane
PH-D Students at the IRISA, France.

From: cb at my domain (Charles Bloom)
Newsgroups: comp.compression.research
Subject: Re: Q: continuous version of entropy?
Date: 4 Nov 1995 07:05:54 GMT
Organization: University of Texas at Austin
Lines: 27
Sender: compression@cs.aukuni.ac.nz (comp.compression.research moderator)
Approved: compression@cs.aukuni.ac.nz
Message-ID: <47f3ci$7a6@net.auckland.ac.nz> References: <45gbor$8dn@net.auckland.ac.nz> <45o3jv$ffl@net.auckland.ac.nz> <461nep$680@net.auckland.ac.nz> <46ar37$lte@net.auckland.ac.nz> <46nb11$bs1@ne
Reply-To: cb at my domain (Charles Bloom)
NNTP-Posting-Host: cs26.cs.auckland.ac.nz
X-Newsreader: NN version 6.5.0 #3 (NOV)

The moral of the story is :

NO BLOCKING WILL CHANGE THE ENTROPY

This is true IF AND ONLY IF the entropy of blocked symbols depends
only on the entropy of non-blocked symbols.

In practice, blocking ALWAYS HELPS, as long as the square of the
resultant alphabet size is on the same order of magnitude as the
data size.

For example, blocking bits into a byte works better than bits,
whenever the data file is 1000 bytes or more.

Blocking into words instead of bytes works well only on inputs
of 100,000 bytes or so..

This is especially true when using an approximate coder, such
as Huffman, which cannot code exactly to the entropy.
In this case, blocking is excellent.

Charles

Charles Bloom, cb at my domain
Independent Compression Algorithm Developer

Charles Bloom / cb at my domain