Herein lies a great quantity of discussion of a coding scheme wherein I thought that the decoder could guess the encoder's probability model, and thereby gain "extra" information in the transmission. This scheme does indeed work as I described it, but the average extra information is negative! (as it must be). While this may seem to deflate the work, I believe it is still very instructive for the concepts emphasized, and the sophistication brought to bear on the problem.


More than one bit per bit?



It seems that under careful consideration, more than 1
bit of information can be kept in 1 bit of storage.
Note that this does not apply to >input< random sources,
for which there are many general proofs (see my web
page, news section, for example).  This does apply to
a special form of transmission.
 
Let us send bits, 0 or 1, to a receiver.  They are
sent with a probability slightly skewed from 50% :
 
p(0) = p = 1/2 ( 1 + e )
p(1) = q = 1/2 ( 1 - e )
 
where 'e' means epsilon, and is small.
If we send N bits with a coding scheme carefully
chosen to have this distribution (eg. with an
arithmetic coder on the back-end of a scheme which
has a skewed 0/1 output rate) then the bits themselves
will be slightly redundant due to the skewing.
The information sent is then:
 
I = N [ -p L(p) - q L(q) ]
 
where 'L' means 'log base 2'
 
I/N = 1 - ( p L(1+e) + q L(1-e) )
I/N = 1 - 1/2 ( L((1+e)(1-e)) + eL((1+e)/(1-e)) )
I/N = 1 - 1/2 ( -e^2 + e( 2e) + O(e^3) )
I/N = 1 - 1/2 e^2

So the information per bit is slightly less than one bit,
as we would have suspected.
 
Now for the finesse part.  The receiver not only gets the
bits sent by the source, he also can estimate the skewing
factor e, by counting the ones and zeros.  

To find the error of this guess we need not assume
a gaussian source or anything of the sort, it is a
simple fractional truncation problem.  If n = pN is the
number of 1 bits, then the measured p (p_m) is :

	p_m = Int[n]/N = Int[pN]/N
 
Where the operation Int[] takes the rounded integer closest
to the fraction pN.  Thus the Int[pN] can different from pN
by at most 1, so that

	abs[p_m - p] <= 1/N

Thus, the receiver can resolve e to one of N values
(eg. the middle of a range of N buckets dividing the range
zero to one).  This means that an overall extra piece
of information is received, with entropy :

I_extra = L(N)

(well, this all is true as N -> infinity anyway; I have
implicity assumed that n = the number of ones actually
is equal to pN , the probabalistic number of ones.  Of
course, for any finite N it is possible that n differs
from pN dramatically.  Taking account for this with the
proper random walk formulas would make the math
intractible).

Actually, on further consideration, this error due
to fractional truncation is smaller than the error due
to the random-walk deviation of n not= pN , (i.e.
sqrt(N) < N ).  However, both are O(N) so that you
can choose whichever you prefer.

Michael Schindler has made the good point in email
that just because we can estimate the average error
does not mean that we can confidently expect to
resolve epsilon inside the "error bars".  Of course
he is right; the right way to do this analysis is
to probabilistically count the resolving power :

I_average = Sum[x] P(x) * H_x

where x represents a certain resolvability (deviation
of the epsilon realized from the input parameter
epsilon), and P(x) is an elaborate combinatoric factor
representing how often a sequence of resolvability x
occurs, and H_x is like log2(1/x).

However, all of this is unnecessary if we instead assume
that epsilon is resolvable in the error bars, and
then account for the times it falls out of that region

Actually, we can fix this as well.  Once the encoder has sent his N bits,
he can compare the actual number of p's and q's with the amount predicted
by epsilon, and hence the amount the decoder will predict.  He can then
encode the delta of the two.  This will be small, and hence easy to encode.
Thus rather than accounting for the errors in decoding, we simply need to
count how many bits this correction-code requires.

How big is the correction code?  By simple arguments, it is negligible.
Consider the worst coder we can think of : if e is out of the pre-set
error bars (1/N resolution) then we simply send it manually.  This happens
50% of the time, and we must send log(N) bits when this happens.  Thus
instead of gaining log(N) information, we gain L(N) - (1/2)L(N) = (1/2)L(N)
This is simply a multiplicative correction, and is irrelevant for N
reasonably large.  There are more imporant corrections, as noted further
below.

(btw we must also encode one flag bit to indicate whether or not we send
the correction code (ie which half of 50%) but this is small)

Let me re-state what we are doing here.  The encoder knows the actual
probabilities p & q (we'll assume these to be real numbers for now; this
does not affect the results).  All the decoder knows is the number of 1s
and 0s actually encoded : n1 and n0.  Now decoder guesses values of p and q,
p(g) = n0/N , q(g) = n1/N .  As N->large , these get very close to the actual
values of p and q.  Let us imagine that both parties agreed on a precision of
m out of N bits.  Thus we round n0 and n1 to nearest multiple of m.  Now, if
Np and Np(g) differ by less than m, the encoder and decoder agree on the
probability - that is, the actually realized effective probability agrees
with our true probability (within the error m).  In this case, we send an
extra zero bit at the end of the transmission.  If this is not the case, then
the encoder knows that the decoder will make an error.  In this case, the
encoder sends a one bit to indicate that the probabilities the decoder will
guess are wrong.  He then transmits the true probability.

We quote the random walk formula, and m = sqrt(N) , then we know
the actual probabilities will be wrong 50% of the time.  Thus, we gain
log(sqrt(N)) information by knowing the probabilities; we lose 1 bit by
sending the are-we-wrong-flag, and 50% of the time we must excplicitly
transmit the probability.  Hence the net gain is

	(1/4) log(N) - 1

The astute reader may balk that we assumed e was small, but
have now counted the information from distinguishing epsilon
in the entire range (0,1).  However, if we did restrict the
range to be small (such as (0,5e)) this would simply by a
multiplicative factor in front of N, which would be
an additive factor in I_extra, which becomes irrelevant as
N becomes large.  (similar to the one bit from the flag above)
 
Thus, we find that the difference (from one bit) in the total
information per encoded digit is :
 
(delta_I/N) = [1/4 L(N) - 1 ]/N - e^2

or approximately: 

(delta_I/N) = L(N)/N - e^2

This is an intriguing result, and begs for a more careful
analysis than the one conducted here.  In particular, as N->infinity,
the first term dies, and we are left only with the small correction
for the redundancy in the skewed bits - nothing is gained.  Of course,
this is not surprising since measuring e is a one-time overall gain
which disappears as N->large.  However, when N is small (eg 2^4) then
the first term will dominate (eg 4/2^4 = 1/4) and we have succeeded
in sending more than one bit per bit.  However, in this region our
assumptions are invalid (in particular, the difference between e
and 1 becomes important, also it must be than N is large enough that
Ne >> 1 so that integer-quantization is not catastrophic).
This feels something like non-equilibrium statistical mechanics :
all our arguments are designed for N -> infinity, but now we're trying 
to explore interesting effects that might occur for N finite.
However, it remains that e can be very small, so it seems there must
be cases where the first term makes the net information gained
positive.

One nice case is N = 1/e , (that is, N just large enough so that
integer-quantization of (Ne) doesn't destroy the effect).  Then:

(delta_I/N) = e L(1/e) - e^2 = L(N)/N - N^-2

Now as N -> large or e -> 0 , we find that the positive term does
indeed dominate the sum, resulting in a gain of information.

[note added 12-30-97: ] you CANNOT choose e to be 1/N as I have done here;
e must be chosen by the coder based on the data which we desire to code
in e.  Hence, it must have a range of possible choices, and the true
delta_I comes from an average thereof.  See the debunking below.

Possibilities for future work :

1. does this work imply something about qubits? (quantum bits).  The
   qubit is an inherently probabalistic binary source.  If we are sent a
   set of N qubits ( p|0> + q|1> ) with skewed p and q as described here,
   we can measure the skewing, and so extract the Log2(N) information
   exactly as above.  Perhaps this means that qubits do carry slightly
   more classical information than classical bits.
	[despite the debunking, I believe there may be something to this;
	perhaps qubits carry slightly LESS information..]

2. See my note on the general-purpose channel
	(or, "how big is nothing").


Bursting the Bubble


(Bubble Bursted 12-28-97)

My "scheme" of making the decoder guess the
encoder's probability skewing does indeed
not work.  (or rather, it does, but the
net space saved happens to be negative!!).

I had completed the calculations to show
the failure in the general case, which
I will present shortly, but let me first
review and give thanks.

I note that Jeff Epler has sent me a
similar experimental analysis in email,
showing that the typical gain is negative.
Andrey Savov and Michael Schindler have
also been of great help.  There are
probably some more that I've forgotten.

Also, Hugo Witters has posted a brilliant
analysis in terms of signal-to-noise in
the transmission (some is reproduced below).
I will just quote some intuitive ideas
from his arguments:

If the skewing (e) is small, then the
numbers of 0s and 1s actually encoded (n0,
and n1) will be very close to eachother,
and the random-walk standard deviation
will be quite large.  Hence, n0 and n1
will contain almost no information about
e, and the only information sent will be
in the flag coded at the end; hence we
will have only sent information about e by
explicitly coding it.

Also we may see that my scheme MUST be
wrong - if it were not we would have a
"perfect" (infinite) compressor, which of
course cannot exist.

In fact, my scheme works like any other
compressors, it expands some inputs, and
compresses others.  In particular, my scheme
will compress strings who happen to have
bits skewed in the same way as e - and will
expand those whose skewing deviates from
the predicted.

Let us now see why I erred.  We start with
the basic (approximate, but good enough
for now) expression:

(delta_I/N) = L(N)/N - e^2

Now, under (all of our) watchful eyes, it
seems that this means an "extra" information
(instead of "extra information" you could
say "compression") which can be small and positive
when 'e' is small.  However, let us note that the
first term is always less than one.  Now,
let us remember what 'e' is for.  'e' CANNOT
be chosen arbitrarily by the encoder.  In
fact, 'e' must be chosen so as to encode some
data.  Thus, for certain types of data to
encode, e will be small, and we will compress.
For other types of data, e will be large, and hence
larger than L(N)/N ! so we will expand.

So, now we can be more quantitative.  We will
assume that the coding model is uniform : that is,
one "symbol" is assigned to each range of e's .
I will also take e to be in the range (0,1/4)
This may not be ideal, but it allows a quantitative
answer.  Hence, the average information gained
(compression) is:

(delta_I/N) =  Sum[e] { L(N)/N - e^2 }
(delta_I/N) = 4 Integral[e from 0 to 1/4] { L(N)/N - e^2 }

But this is actually not right, because the expression
at right is only accurate when e is small (and N is large).
(and overal factors have been dropped).

If you go through my original anylsis but make fewer
approximations, you will arive at the accurate formula:

(delta_I/N) = (1/4) [ L(N) - L(1-e^2) - 2] 
		- (N/2) [ L(1-e^2) + e L(1+e/1-e) ]

We may now integrate this over e, using some tricks
(mostly, L(x) = ln(x)/ln(2) , and then do integration
by parts on the ln's) we find:

(delta_I/N) = L(N)/4 - 0.1507 N - 0.5678

For N large this simply give -0.1507

We may find the N which minimizes the loss.
It is N = 26.34 , giving a (delta_I/N) of -0.100

Hence, the coding redundancy from the skewing is
far greater than the gain from distinguishing the
skewing.  The best we can do is to fix e = 0,
in the coding step, in which case this method does
nothing but code the information in e with the
correction code.

One might think from these arguments that we
can do better with a small range for e, but then
less information is sent (fewer buckets) and the
net is again negative.  (as has been shown by
Hugo Witters in the limitting case of only
two buckets).

Let's consider a simple model that we can try
to optimize.  The encoder tries to send an
extra bit by skewing the probabilities by
(+/-e) (read: "plus or minus epsilon").  Here e
is an externally chosen parameter, the coder
sends a bit with the +/- choice.  Now, the decoder
can decode that bit if the number of 1's (n1) is
greater than the number of 0's (n0), it guesses
a skewing of +e, or if n1 < n0 it guesses -e.

[small correction below due to Michael Schindler,
12-31-97]

Now, if the encoder sees that he used +e but got
n1 < n0, then he must send the correction.  So,
we must flag whether a corrrection is needed; if the
correction is not needed we're done, if it was, the
decoder just uses the opposite of what he guesses,
so all we need to do is flag whether the guess was
right.  So, in addition to the coding inefficiency, we
must send:

when right : flag we're right
when wrong : flag we're wrong

Now, if e is small (we'll confirm that it is
optimal when it is very small (N is large)) then
the information sent per bit is (1 - ee/2).  Hence
the net information sent is :

I = (1 - ee/2)*N 
	+ 1 
	- (  P L(1/P) + Q L(1/Q) )

The first line is the information in the skewed
coded bits.  The next line (+1) is the extra bit
encoded in the skewing.  The last line is the
entropy of the optimal coding of the flag bit,
where P is the probability that n1>n0 when e is
positive, and Q = 1-P ;

Put dI = I - N , the "extra" information
(aka compression).  Now, if e was zero, P and Q
would be 1/2 , so let us first consider this
limit.  Since e is small, perturbation will be
done around this limit.

dI(e=0) = 1 - ( 1/2 + 1/2( 1 ) = 1 - 1 = 0

There is no gain.  It is easy to see conceptually
why this has happened.  Our encoding scheme fails
so often that we must send the correction, cancelling
the gain.

Now, the lowest order corrections in e are also easy
to compute. (to do this exactly, you must evaluate an
"error function" or worse, sum the random-walk
distribution over some range, both of which can only
be done computationally, not analytically).  The
distribution of (n1)/N is roughly Gaussian, and the
effect of e is to shift the center of the gaussian
away from 1/2.  Thus, to compute P and Q we must
compute the area under the curve from 1/2 to 1/2(1+e)
Now, since the Gaussian has standard-deviation of
sqrt(N) and is normalized, it must have a maximum
height of 1/sqrt(N pi).  Hence the shift due to
moving the center makes:

	P = 1/2( 1 + e/sqrt(N pi) )

Thus the entropy of the flag and correction becomes:

  H	= (  P L(1/P) + Q L(1/Q) )

	= 1/2( 1 + e/sqrt(N pi) ) [ 1 + e/sqrt(N pi) ]
	+ 1/2( 1 - e/sqrt(N pi) ) [ 1 - e/sqrt(N pi) ]

	= 1/2( 1 + 2 e/sqrt(N pi) + ee/N pi )
	+ 1/2( 1 - 2 e/sqrt(N pi) + ee/N pi )

	= 1 + ee/Npi

Where I have used the multiplication property and taylor
expansion of the logarithm.
Hence, the information change is:

 dI	= - Nee/2 + 1 - H

	= - Nee/2 + ee/Npi

	= - ee N/2 ( 1 - 2/(NN pi) )

We see from the positivity of the last term that the
skewing has reduced the redundancy from the flag &
correction, but not nearly enough to cancel the loss
due to the coding redundancy; the gain is quenched
strongly by a factor of NN = N^2
Obviously, dI is smallest for e=0 , so our assumption
of e small was valid. 


Some Questions and Answers


>> Tom Lane wrote:
>You have forgotten about the coding overhead.  How did the
>receiver learn how to decode the symbols?  [...]  In any
>case, the coding overhead will necessarily amount to more than
>the alleged "extra" information.

I did count the coding inefficiency introduced by having skewed p & q,
and epsilon is transmitted explicity in the data.
Tom's point may be interpretted in
a more subtle manner : the fact that the encoder and decoder have
agreed on a certain transmission scheme is an implicit overhead of
information; (that is, they have agreed that the decoder will count
the zeros and ones to estimate the skewing).  However, I think that
this agreement can be done in an initial communication of finitely
many bits, that is the # of possible coders is not O(N), while the
amount we gain IS O(N), so that we may neglect this communication.
(note that in the various schemes which allow you to "sneak" one bit
off of a 'random' input, we can interpret this as failing to count
the implicit encoder/decoder agreement; however, these gain like O(1),
not O(N)).

We can interpret this as epsilon's ability to have nearly infinite
precision, being a real number, thus being able to transmit a huge
amount of information, limitted only by our ability to quantize this
real number.  I have shown here (to my surprise!) that the errors in
estimating epsilon from the difference in actual probability and
theoretical probability are no bigger than the quantization error in
making (Ne) an integer.

-----------------------

In article <67pr81$pk6@graves.maths.tcd.ie>, tim@maths.tcd.ie says...
>But n bits can certainly hold more than n bits of information,
>since they also hold the information in the number n.
>
>The additional information is roughly log(n),
>which seemed to be the additional amount you got.

Well, this is indeed correct, and well said.  However, my method
does not actually use the number N.  It only uses the ratio n/N,
the apparent probability.  The only reason we end up with a log(N)
additional information is that we are specifying the probability,
a real number, and can only get as much precision as we have bits,
hence the log(N) dependence.  In particular, the standard tricks
for getting an extra log(N) from the length could still be
performed after having done my method.


-----------------------

>> Chris Hillman wrote:
>Are you saying that if you fix some K large, and take many samples of
>sequences of K bits produced by your source and compute the ratio of zeros
>to ones for each sequence, this ratio will be distributed like a bell
>curve around a mean slightly different from 1/2?

Right.  If the sender has p(1) = p, p(0) = q, and the receiver
finds there are actually n 1's and m 0's , n+m = N , then he can
estimate p and q, pe = n/N , qe = m/N .  Now, the full formula 
for n is the random walk formula:

P(n) = (N!/(n!m!)) p^n q^m

But when N and n become large we can approximate this with
a Gaussian, centered around n = Np
Hence the distribution of pe's is a Gaussian around p.
As usual, the standard deviation for a gaussian is sqrt(Np/q)
but since p/q = 1 to order epsilon^2 , we have simply a
standard deviation of sqrt(N) in n, hence sqrt(N)/N in pe

standard_dev( pe ) = N^(-1/2)

(actually, as Andrey Savov astutely pointed out, we need not
use the Gaussian approximation; these results are exactly the
same for the exact random walk formulas)

-----------------------------------------

Michael Schindler's nice conceptual interpretation:

>> Andy McFadden said:
---
> This I have trouble with.  For an input of length N, you need to transmit
> M bits of model, in this case it's 'e' or something derived from 'e'.  As
> N increases, I believe M increases as well; otherwise I can come up with
> an input of size N+x for which I can change the value of a symbol and
> not have enough precision in 'e' to detect the change.

M is a parameter that you can choose.
I'll explain the idea with other words, without mentioning log at all :)
You have a total of X bits of information (visualize it as an X bit long
random binary stream).
I will not consider how you send along the length of this binary stream,
you have to send it along with any compression method, even if you do just
verbatim copy. You may assume X to be constant if not knowing the length
bothers you.

These number (X) of bits can be split in two parts: X=M+N
some of the bits (M) are used to determine a model, and the
other part (N) is transmitted using that model. M is constant, so no
overhead involved in how much bits are encoded in the model choice.
The idea is now to take the transmitted string (code for the N bits)
and try to guess the model (which contains M bit of information).
Since you have to resolve the ambiguity in the model some extra bits are
needed.
The question is:
is the average loss caused by using an inexact model plus this extra
infomation to determine the model (or e, which is the same) less than M
(for some M, N) - if it is it would be great, if you can prove it isn't
it is also great - Chales will be glad to be proven to post impossible
ideas here. Charles idea is that we know a lot about the model (e) 
already just by seeing its output, so less than M bits should be needed
on average.

It may be required to make N (and thus X) a variable, but the value can
be determined at the decoder) and fix the length of the message containing
N bits of information - this would not change the fundamental question,
although it makes the question more complicate.

Charles did not post it in this general form, his choice of M was
that e is accurate to sqrt(N), but that need not be. Just getting the
sign of e right is just one bit and may be easier to guess - quite easy
actually if you choose e large, but then the loss is large too.

-----------------------------------------------------------------------

[
[ more good stuff from Mike Schindler : I've incorporated some of
[  this into the main text
[


hi!

found a bug:

Charles Bloom  wrote in article <68bfou$rt1@gap.cco.caltech.edu>...
> Now, if the encoder sees that he used +e but got
> n1 < n0, then he must send the correction.  So,
> we must flag whether a corrrection is needed, and
> then if it is, we explicitly send the extra bit.
> So, in addition to the coding inefficiency, we
> must send:
> 
> when right : flag we're right
> when wrong : flag we're wrong , + 1 bit

just the flag will do; in a two way choice knowing that it is wrong
leaves not much alternatives. 

This is one of the reasons why binary decisions only work better in this case.

Another problem: if you base the probability of beeing wrong on
the actual numbers of 0's and 1's (if you get 120 1's and 90 0's it
is more probable that the result is right than if you get
101 1's and 99 0's) you can reduce the amount needed to signal
wether we are right or wrong a little; however the net gain is still
negative.

So the question is not: if I choose the coding favouring the 0's
what is the probability that there will be more 1's, but
if I have a given distribution of 0's and 1's, what is the probability
for each of both models (or more than two in the general case).
then encode a symbol (you called it: flag we are right + correct value)
that gives the correct model using the distribution of models calculated.
The code for this symbol should be short on average,
but total gain is still negative.

Michael

[
[ my reply:
[

Ah, yes of course; when e = 0, the flag bit must simply send the
extra bit we are trying to sneak. 

>Another problem: if you base the probability of beeing wrong on
>the actual numbers of 0's and 1's (if you get 120 1's and 90 0's it
>is more probable that the result is right than if you get
>101 1's and 99 0's)

Hmm.  Yes, that is indeed true, but seems awfully hard to analytically
model.  Let me rephrase this for the reader :

when the decoder gets a stream, he knows not only that n1>n0 (which
encodes our bit), but he also knows the value of n1/n0 , and he can
see that when n1 = n0 roughly he cannot be very confident in his guess,
and so must 'expect' to read a correction flag about 50% of the time,
hence 1 bit of correction.  But, when n1/n0 = 2 or so, he can be very
confident that his guess is right, and so 'expect' to read a correction
only rarely.

Note that you might try my scheme with e=0 in this case, to try to
take advantage of the "extreme cases" of n1 & n0 that come out of a
balanced model; hence there is 1 bit per bit, and we need a correction
flag; when n1 = n0 all the 1 bit is in the flag.  When n1 is far from n0
the decoder will guess that 1 is skewed more than 0, but since we skewed
not at all, he will be wrong half the time, so this is no good.

>if I have a given distribution of 0's and 1's, what is the probability
>for each of both models (or more than two in the general case).

Yes, well put; you get more information from this reverse-guessing
procedure.

>The code for this symbol should be short on average

Well, the code for that symbol should be very close in length to the
code for my simple approximation, since the extremely skewed cases
are so rare.
The right entropy is:

H = Sum[ sets (S) of N 0's and 1's ] 
		P(S) * [ P(+e made S) log(1/P) + P(-e made S) log(1/P) ]

-----------------------------------------------------------------------

> a brilliant post by
>
> hugo.witters@tornado.be (Hugo Witters)

Following is a more detailed example and calculation including the
information send by the coder to correct the errors caused by the
noise signal. The distinction between the bit (measure of information)
and the binit (binary digit, which is an output binary symbol) should
be noted.

For clarity, I take a decoder where the "zero" binits received have
-1V (Volt) level and the "one" binits have +1V level. The average
received signal will then be 0V without skewing. The coder operates in
two modes: +e skewing and -e skewing, respectively resulting in a +eV
and a -eV level. The threshold of detection is set at 0V. Positive
voltages will produce a one, negative voltages a zero output signal. 
Working with e = 0.01 and N = 5000, this would yield a useful skewing
signal (integration of signal over N and dividing result by N) of +
and -0.01V  and a noise voltage superposed on the previous signal of
1/sqrt5000 = 0.014Veffective with Gaussian amplitude distribution. 

This means the skewing signal level lies at 0.707 of the standard
deviation of the noise. The error rate is then 0.24, or 24 percent of
the skewed bits are inverted. To correct this, the coder has to send
information, say in an extra time slot at the end of each block, where
a one would signal the skewing receiver to inverse the detected bit
and a zero would leave the output signal intact. The average
information carried by this extra time slot per block would be:
I =  -0.24 L(0.24) - 0.76 L(0.76) = 0.795 bits per block. 
The average infomation of one binit in this time slot would go to 1
bit for a 50 percent error rate (random signal) and to 0 bit for a 0
percent error rate, so this value could be used as the penalty in bits
caused by the error signal. I don't see a more information effective
way to do the correction.

The information balance in this example would then give 1 bit gained
by the skewing code, 0.5 bits lost by the redundancy penalty and 0.795
bits lost by the signal to noise level, giving an overall loss of
0.295 bits per block.  

This is not the optimum choice of N/Nmax ratio. When one varies this
ratio from 0 to 1 the optimum lies at N/Nmax = 0.  In this case the
skewing detector is completely drowned  by noise and produces a random
signal and the binit in the extra slot carries 1 bit of information.
This means the skewing system does not work at all and the extra bit
is simply encoded in the extra slot. When N/Nmax increases the skewing
system produces more and more information, and the binit in the extra
slot less and less, but the penalty associated with the skewing coding
also increases and produces an overall loss of information.

N/Nmax Ratio      Total  Information Penalty in Bits per block
0                                     0
0.05                                0.027
0.1                                  0.055
0.2                                  0.11
0.5                                  0.29
0.75                                0.46
1                                     0.63
These penalties are independent of the value of e as long as e remains
small..

If a large number of different e levels are used to code k extra bits
in each block, the error rate comes so close to that of a random
signal (S/N = sqrt(k+1)/(sqrt2*2^k)  when both positive and negative e
values are used with equal spacing), that almost all of the
information has to come in via the correction bits in the additional
time slots like in the above example when N/Nmax comes close to zero.


Charles Bloom / cb at my domain

Send Me Email

Back to the News Index

The free web counter says you are visitor number