Here's a collection of notes from an Email conversation with Mark Carrol. These notes are very disjoint and random, but may help anyone trying to implement an integer arithcoder.

Well, let's look at it.

Say 'a' has a 255/256 chance of occuring , and 'b' has a 1/256 chance of
occuring ; we'll pick this (1/256) as the maximum resolution of the symbol
probability.  Let's renormalize our L and H to be integers ( i.e. multiply by
65536 before doing everything ).

Initialiy, L = 0  H = 0x10000
now we get an 'a'       L += (H-L)*(1/256)
L = 0x100  H = 0x10000
now we get an 'a'       L += (H-L)*(1/256)
L = 0x1FF  H = 0x10000
now we get an 'a'       L += (H-L)*(1/256)
L = 0x2FE 


Ok.  This is no problem.  We can just keep doing this until L gets higher than
0x8000 (1/2)

What if L doesn't get that high?  No big deal, then we have some freedom.  All
we have to do is pretend it did.  What are we doing here?  We're extending the
string out to infinity in our mind.  Thus, maybe we inputted "aaaaaaaaa"
and we outputted "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" because we rounded L
up to 1/2
This, however, is just fine because we either wrote an EOF, or we sent the length of the uncompressed stream before everything else.  (for an EOF, we
would reserve an extra symbol 'c' with 1/256 probability and write it at the

Does that solve it?

Well, I know the answer is that the answer is if your machine
word is 16 bits, then the probabilities can only be 14 bits, but
let me see if I can remember how to derive that.

(L,R) are the arithcoder interval
(Sl,Sr,St) are the cumprobs (symlow,symrange,symtot)

arithcoding consists of :

        L += (R*Sl)/St;
        R  = (R*Sr)/St;

The biggest R can be is 16 bits
        in that case, you must have St <= 16 bits or else R*Sr or R*Sl could
        overflow the 32 bit register

The smallest R can be is 14 bits (if it ever gets to 1/4 of 16 bits, then it
  can be renormalized and sent out)
        in that case, Sr = 1 is our worst case.  We then have
                R = (14 bits)/St;
        obviously St cannot be bigger than 14 bits here because if it were,
        then this step would result in R = 0

That should do it!

>I understand the coding process now - btw, I went to having 17-bit bounds 
>and 15-bit probabilities.

Yeah, I was thinking about that too; it occurs to me that there is supposed
to be some problem with that, but I can't remember what it is right now;
if you have trouble and cant figure it out, try 16 & 14 ...

>I'm having a little trouble decoding, though. I didn't see anything about
>it on your webpage, but I was guessing that what I might do is: 

You make an excellent guess, but it's actually a little bit easier than

if L and R are in 16 bits, then read in 16 bits from the file and call
it C, the "code".

To decode a symbol :

compute T = (C - L) * St / R

(where St is against the symbol cumprob total)

Now all you have to do is find the symbol such that

Sl <= T <= Sh

Now, after you've found this symbol, proceed just like you
were coding (i.e. do halve-up & halve-down & halve-middle).
The only difference is that you don't ever output bits,
instead you input a bit into C (shifting left) each time
you would have outputted a bit.  Don't worry about the
halve-middle being wierd, it was handled by the encoder.

Also, take a look at the source code on my page.  The arithcoder
by me is very clear, and shows you two different equivalent
implementations (one is bit & shift-wise and the other is one/half/quarter

>Hmmm - things aren't going wonderfully - should I be keeping track of my
>bit-plus-follow counter while decoding, and letting it cause me to input a
>few bits at a time sometimes at the next halve-lower or halve-upper?

No, just input a bit into C (code) every time you would have
done a bitplusfollow ; more clearly :

input a bit into C every time you make R grow.

i.e. every loop in the renorm.  Thus, your basic decoder renorm 
loop looks like:

while ( R <= Quarter )
        if ( (L+R) < Half )
                /* halve down */
        else if ( L >= Half )
                L -= Half;
                /* halve up */
        else /* L < Half and (L+R) >= Half */
                L -= Half;
                /* halve middle */

        L <<= 1;
        R <<= 1;
        C <<= 1;
        C |= readbit();
        C &= One;

        /* note that it shouldn't be necessary to &= One on L or R, since
                they stay in the right range on their own */
        /* also note that L += L is usually faster than L <<= 1 */


One = 0xFFFF;
Half = One/2;
Quarter = One/4;

>Also, it's some time since I had an Amiga...  it's a pity
>they're not more popular: they're lovely machines. How's the US end of
>Commodore-Amiga nowadays? Is there much left of it? 

I think I'm single handledly keeping the Amiga in use in America :^)
And I only do that by coding on the Amy and then porting to the PC...
damn it's sad..

>Well, it's working at last (I think (-:)! The only '+ 1's and '- 1's I'm
>using are: subtracting 1 from a newly-calculated high bound; and for
>calculating the cumulative probability of the next character in decoding,
>as (tot_cumprob * (input_stream - low_bound + 1)) / (high_bound - low_bound)
>for reaons I'm not quite sure of.

Yeah, I remember all that.  I recall that it is possible to derive all
the plus ones and minus ones, but it's pretty tough.  Here's why you need

1. ensure propper rounding, since we're doing this all in integers instead
        of reals

2. because we're using 0xFFFF = One , instead of 0x10000 , which is ACTUALLY
        one.  Thus, you must add one to the upper bound sometimes, and
        subtract one from computed upper bounds sometimes.

- note that you could implement it all using reals, and an actual 1.0 , 0.5 ,
0.25 values for the renorm, and then you shouldn't need any +- 1

Bye now


Charles Bloom / cb at my domain

Send Me Email

Back to the Index

The free web counter says you are visitor number