How big is nothing?



In the course of exploring "how many bits per bit", it
has come to my attention that "how big is nothing" is
an important question that I have never seen answered.

First of all, nothing must be at least 1 bit.  That is,
we must send a flag for "nothing/something".  If we did
not do this, then we would have a channel which did
nothing but transmit nothing (no, that's not a double
negative!).  (You're not allowed to choose different
channels, since that would require transmission over
the 'channel-choice channel').
Hence we need 1 bit for the existence of the data.
We also need roughly 2*Log(N) bits to send the EOF or
the file size of an N bit sequence.
I get 2Log(N) because I assume the following coding
scheme :

prior to writing each symbol we code an adaptively
adjusting escape : flag 0 for EOF, flag 1 for else,
then code as normal.  Hence, after n symbols the
probability of EOF is 1/n .  Thus the total
output length added by this scheme is :

L = log(N+1) + Sum[n=1 to N] log(n+1/n)
L = log(N+1) + Sum[n=1 to N] log(n+1) - log(n)
L = log(N+1) + log(N+1) - log(1)
L = 2log(N+1)

If there is a more efficient general method, let me
know.  (note that we cannot just write L in log(N+1)
bits; if you wish to try to write L as a number at
the head of the file you must use some variable-length
scheme with an implicity probability distribution of N's;
that is, you must write an EOF for the number itself).

Aside/addendum : in fact, this can be done quite nicely.
First, one way is to unary-code the length of the bits
of N, which takes log(N) bits, then to code the length
of N itself, which takes log(N) bits.  This is then
exactly the length required for my EOF code.  However,
you can do still better by using this trick recursively:
instead of writing L = log(N) using unary, we code L
the same way we did N : write the length of L, then
the bits of L raw.  We can write the length of L again
with this scheme.  Then, instead of 2log(N) we can
use log(N) + log(log(N)) + log(log(log(N))) + ... bits.

All of this stuff is usually hidden away in the
operating-system's file headers, but if we wish to
discuss fundamental issues we must account for this
business.

Now, there is a more subtle issue.  We must also
specify which coder we are using.  If we used
an "algorithmic" (Kolmogorov/Chaitin) compressor,
this would not be an issue, but I consider this
unacceptable because of the halting problem;
we also wish to have a more general scheme to
discuss the performance of other coders.  Now,
one solution is to have the convention of sending
the data as a self-extracting executable, ie
sending the decoder before the data.  This is
of course the most general method, and seems the
best we can do (it is the only way to account for
the sneaks who would put the whole universe in a
static data file in their decoder!).

Hence, we find :

L_tot = 1 + 2log(N+1) + sizeof(decoder) + sizeof(data)

But now we can cut out losses.  Since we have generated
this general purpose cannel, we can now make one-time
agreements between the receiver and transmitter.  Since
this channel has existed since the big bang (when time
began) the one-time communication cost is irrelevant
compared to all the data we can send over it.  So, what
EXTRA receiver-transmitter schemes can operate in
conjunction with this channel?

1. N has information.  Since we've explicitly sent N ,
the receiver will find N.  In addition to telling
the decoder when to stop receiving, it can also contain
information (as is well known; see for example
my web page where I detail the operation of an 8 -> 6 bit
coder which works on this principle (hiding information
in the string length)).  It is roughly log(N) bits
(for N large, anyway; perhaps log(N)-1 more generally).

2. I've also now suggested that the coder can skew the
0/1 probabilities, and the decoder can figure out the
skewing, and hence get another log(N) that way.  See
again my web page for an extensive review
[ note added 12-30-97 : this does NOT work ]

The conclusion is that our above result is modified
(for large N) to :

L_tot = sizeof(decoder) + sizeof(data)

!  This is quite amazing to me; our implicit
schemes 1 & 2 have exactly cancelled the overhead
necessary for a general-purpose channel!


Charles Bloom / cb at my domain

Send Me Email

Back to the News Index

The free web counter says you are visitor number