The 2nd law of stat-mech in info-theory

The relation of info-theory and the 2nd law
seems to provide a cute way to understand
the general incompressibility of data.

(none of this is really new; most of it is well
known to Bennet,Chaitin, etc. ; I present a
slightly different viewpoint : I show that various
info-theoretic results follow from physics,
instead of the usual approach which does the

First, some simple marks on this relation
are needed.  The nicest way to see this
relation is simply by abstracting a physical
process.  Consider N copies of a two-state
physical system - such as an array of spins,
or gas molecules which are either on the
left or the right - we abstract these to be
bits.  Now, if all N are in a pure state,
then their entropy is minimized.  It is well
known from the second law that as we let them
randomize, work can be extracted from them.
(eg. the molecules all to the left or right
will randomize to be a mix of left and right,
which can do work for us if we put a piston
in the chamber; similarly, an array of
spins produces a B field, which we can
extract work from by sliding another spin
along (and the array will become randomized).

The physical correspondence is that randomizing
a "bit" frees kTln2 of energy. (similarly,
erasing a random bit requires kTln2 of energy;
this can be nicely understood with the Maxwell-
Slizard demon). (T is the temperature of the
isothermal bath).

So, for a binary order-0 string of bits, we have
a nice relation between the Shannon Entropy (H) and
the amount of energy (E) which must be used to erase
the string :

E(erase) = kTln2 H

Thus the energy liberatable from a given string
(of length N) is

E(lib) = kTln2 ( N - H )

And C=(N-H) is nothing but the compressibility (C).
This means that we can measure the compressibility
by counting the energy exchanged with the bath :
if we give the bath energy, then we must have compressed
information at some point (gathered redundancy).

It is also important to note that any reversible
information-theoretic or computational operations
have no net energy cost or liberation. (this is
simple thermo and of course must be true or you
would have a perpetual motion machine).

I go into more rigour in an addendum, but
we can see this form must be true.  The statistical
mechanical probability of occupying a certain
state is :

P = e^{ - E/kT } = e^{ - kTln2H / kT }
  = 2^{ - H }

Which is exactly the Shannon probability of
randomly picking a string of entropy H !

"Bloom's Demon:"

I describe a reversible process which would be
a perpetual motion machine if not for the info-theory
of incompressibility.  Consider an array of 2^N , N bit
strings.  The array is immersed in a bath at temp T
and allowed to fully randomize (H=N).  It is then removed
to adiabatic isolation.  Now, the vast majority of
strings will be truly random, but some strings from
a random source are quite ordered - 0000, 111, 010101, will
all occur (on average, with 2^N strings, each possible
string occurs once).  Now, an active demon is also in
our 'system'.  He moves down the list of strings, looking at
each one.  If he examines a string, he can tell if it is
compressibile or not (he has access to a perfect compressor,
say); if it is incompressible he flags a 1 in his memory,
other he leaves his memory at 0.  If he desides it is
compressible, he performs the reversible transform of turning
the N ordered bits into H random bits and (N-H) zeros (for
entropy H).  Now, he has examined all 2^N strings, has made
many new zeros in them, and has an array of flags in his
memory.  We put the system back into contact with the bath,
and allow all the zeros we have made to equilibrate.  As
discussed above, this liberates an energy

	E = kT ln2 Sum{strings} C(string)

where C = N - H(string) is the compressibility.  Now that
all the strings have equilibrated, they are back in the
initial random condition.  However, in order for this to
be reversible, we must also restore the demon to its initial
state.  To do this we must clear 2^N bits, which takes in
energy.  The net compression attained is :

	C(net = (E/kT ln2) Sum{strings} [ C(strings) - 1 ]

Which is strictly negative (we amortize C : it cannot be
less than zero).  (in fact, a very large negative).

This is identical to the incompressibility of all strings,
and the fact that if you compress one string you
must expand another - in this context, if you can
extract work from one string, you must have done
work on other strings, or on your own mechanism.

(there are some subtleties here; the Bloom Demon
could use a reversible probability estimator so
that it doesn't waste a full bit each time, since
it would know that the "non compressible" case is
much more likely (for example, first count the # of
compressibile strings (M), then we need log(2^N choose M)
bits to specify which ones were compressible; if
we do this perfectly we could do as well as C(net) = 0);
also these arguments do not carefully account for the
fact that N is assumed to be known in advance, which
provides a way to "sneak" some information)

We can also unite this to my old viewpoint that the
algorithmic info ("maximum compressibility" ala
Kolmogorov) is obtained by finding an ideal basis to
transform into, and then using the Shannon H.  This is
here true from thermodynamics because of the fact that
reversible cycle does not liberate energy, and an
(orthogonal) basis transform is reversible.  Thus,
we may find the optimal compressibility by allowing
our system to make adiabatic reversible changes, and
then letting the ordered part randomize, thus extracting
work ( = compressibility).  Some examples are in order:
(examples shamelessly stolen from Bennett) :

n copies of the same (random) string (of N bits).
These may be reversibly
transformed by xor-ing the ith string against the 1st
one; when they are all the same, the result is that all
the strings are cleared except the first.  This is
reversed simply by repeating the same operation again.
Since O^2 = 1 (for the operation O), we see O cannot
have any energy cost.  So, after we have done this,
we have (n-1) strings of N zero bits, so we can liberate
energy or compressibility :

	E(lib) = kTln2 * C
	C = (n-1)*N

A more radical example which shows the way E(lib) is
related to Kolmogorov is the string 'pi' ; The mapping
pi -> 0 -> pi is fully reversible, if we assume that
there exists a "simple" program to compute pi (ie of
finite size, and using secondary storage only in
a reversible manner) (this is a reasonable assumption),
then by applying this operation I can liberate all
the kTln2*N energy from N bits of pi given to me ;
the number of random bits left over after the reversible
operation is simply the size of the proram for computing
pi, which is the Kolmogorov complexity.  Hence, I can
find the relation: given a string S and a reversible
mapping M, and an energy liberated E :

E(lib)/kTln2 = max(M){ |M+M(S)| - H( M + M(S) ) }

(where the "max" runs over all choices of M, and the H is
just the Shannon entropy)

Questions and Answers

> Henry Baker wrote:
>> E(erase) = kTln2 H
>This argument is circular, in the sense that this _is_ the definition
>of T (temperature)!
>You have not supplied a definition of 'energy' in this context.

Well, I have not been able to successfully get those files from
netcom (the netcom ftp server is notoriously overcrowded).
I will reply nonetheless.

That is not the definition of T.  Rather, T is the temperature of
the external bath - the energy changing processes are assumed to
occur isothermally (in a bath at T), while the constant-energy
processes are done isolated from the bath (just like a Carnot
engine).  The 'Energy' is the amount of physical work extractable
with some apparatus.  This must be the same regardless of the
apparatus, so I need specify no more.
(though the most important point is that I never really used
the temperature, anyway, just the fact that E and H are

To clarify, I will demonstrate the derivation that I mentioned
in my original post (the Maxwell-Slizzard demon); I can consider
just this example since the results must be independent of
the specific setup.

A single atom (or packet, if you prefer) of gas is in a box
of volume 2V .  A "demon" is also in the box.  At some time,
the demon detects which side of the box the atom (packet) is
on, and records it in his memory; he then pulls a membrane into
the middle of the bax, splitting it into two regions of volume
V (this is all done adiabatically, and moving a membrane about
in a frictionless world uses no energy and is reversible).
Now, the box is put in a bath at temperature T.  The membrane
is allowed to expand isothermally.  This does work against
the membrane (or some piston, or whatever) :

	dE = dW = - P dV = - (kT/V) dV

where we have assumed the single atom is an ideal gas (eh?),
Integration :

	E = kT ln ( Vf / Vi )

now Vf = 2V , Vi = V , so:

	E = kT ln 2

Now, we remove the membrane, and the system is exactly in the
same state it started in : except the demon's memory still
has the side-of-the-box bit flagged.  This must be erased for
this engine to be reversible, and the total energy change
for the reversible process must be zero, hence it must cost

	E = kT ln 2

(in energy drawn from the bath) to clear the demon's 1 bit of
memory.  Note that the demon's memory when cleared is a random
single bit, so it has entropy 1, so

	E = kT ln 2 * H

We can see that this extends : if the demon had 2 bits to
distinguish 4 partitions of the box, the entire argument
would be the same, but Vf = 4 Vi , so

	E = kT ln 4 = 2 kT ln 2 = kT ln2 * H

Again.  Similarly, we can see that this must be H, and not
the # of bits in the demon, since the demon could add an
extra bit which is constant (entropy zero) without changing
the energy cost of erasure.

I don't want to get too distracted by the kTln2 in front
of these.  The ln2 is just a convention for the fact
that information-theorists use log_base_2 while statistical
mechanics use natural logs.  The kT factor must certainly
be there, because E = C T is generally true (for some C)
in these situations, however in some other geometry I
suppose this factor could be different, E = 3/2 kT ln2 H
or whatever.  The important thing is really just that
E = C H for some constant C (where C is constant when T is,
that is E = C H for an isothermal extraction of energY).

The 2nd law in Quantum info-theory

Surprisingly, all of these arguments apply to quantum information, and
are in fact easier (!) because of the simple operator structure there.
I will assume you are familiar with the density matrix and Von Neumman
entropy (S) of quantum information theory (or, you can think of S as
equal to H when the quantum state are classical ensembles).

First of all, for a closed system, S is constant, because evolution is
unitary which doesn't change the entropy (S).  When the system is open
(a subsystem of a larger closed system), S can only go up (a trivial
calculation).  This is the second law in quantum thermodynamics.

Now, I need a "demon" scheme to try to extract energy from some quantum
system.  I will use my later schemes of examining a set of quantum strings
to find the low-entropy ones, and allow them to thermally randomize and
thus increase their entropy and extract work.  We must show that the energy
extractable goes like the entropy (more precisely, the compressibility).
If this were not true, I could have a classical Maxwell Demon store its
info in quantum bits, so if this equality were not true, I would have a
perpetual motion machine.

So, how do I extract energy from a qusystem?  A simple model of spins.
Each qubit is a spin up or down.  I will extract energy making all the
low-entropy spins point up, and then getting work as they randomize
down.  This work goes just like the number of spins up (the net mag field);
there are various constants like the mag-moment and the kT factors, but
these are all irrelevant; we will choose some units where they become 1.

Now, what does this have to do with the entropy?  It's quite simple.  The
best I can possibly do is to choose a basis (or apply a unitary transform
= reversible) such that the most likely string is all up spins; the next
most likely has one spin down, etc.  This maximizes the average number of
spins up.

I need some considerations from classical info theory (not really info
theory, just combinatorics).  For strings of N qubits, there are 2^N total
strings, but almost all of the strings are in some subspace of 2^NH strings,
where H is the entropy of one qubit.  Let me assume H is reasonably small (1/2),
so that this subspace is small - so that I can rotate most of my strings to
point mostly up.  What is the the average number of spins up?  It's simple
counting.  In the subspace of 2^NH strings, each one have a prob. of occurance
very close to 2^(-NH), that is - they are roughly uniform.  

Let me first do it heuristically; here it is just like for classical info.
I choose my N-qubit string.  It has entropy NH on average; that is, it lives
in a subspace of 2^NH ; the optimal transform to "up" is thus a map to NH
bits of data (hence random), and the remainder up.  Thus, N - NH = N(1-H)
are up.  Since N(1-H) = C , the compressibility , we have simply that the
number up (=energy extractable) = C.

Now, let me do it rigorously.  We want to count the average number of ups
in the maximal mapping.  Each string in the 2^NH subspace occurs with
equal probability, hence, I must simply count how many ways there are to 
map them into my up basis:

1 state can get all N spins up
N states can get (N-1) spins up
(N choose 2) can get (N-2) spins up

Hence, the average number of spins up is

average up = 2^(-NH) Sum[k] (N choose k) * (N - 2k)

Note : the N-2k is there because a spin down counts
as a negative; Also note that the sum[k] is
defined from 0 to X where X is chosen such
	Sum[k=0 to X] (N choose k) = 2^NH


average up = N - 2*2^(-NH) Sum[k to X] (N choose k) * k

Now we need to evaluate this.  As N -> infinity, X
simply becomes NH ; also (N choose k) is dominated by
(NH choose k) (see below), so we have:

average up = N - 2*2^(-NH) Sum[k to NH] (NH choose k) * k

Now, this is a well-known formula in combinatorics; you
may evaluate it easily by using the symmetry of the
"choose" operation;

	Sum[k to M] (M choose k) * k = (M/2) * 2^M

That is, it's just an average of k over all the choices.
So, inserting, we find:

average up = N - 2*2^(-NH) (NH/2)* 2^(NH)
average up = N - NH = C

Which is exactly the result we got from the heuristic
treatment.  This combinatoric analysis would have
worked great for the classical case earlier, but it's
hitting a nail with a sledge hammer; it works for the
quantum case because the "typical states" and the
can be a "typical subspace in Hilbert", and the mapping
to minimum redundancy basis is a unitary transform to
spins up.

Now a note on the (N choose k) substitution we used
above. It's easy to evaluate using a trick from info theory.
The probability of getting a string with k zeros is :

P(k) = (N choose k) p^(k) q^(N-k)

In the probable subspace, k = pN, so:

P(k) = (N choose k) p^(pN) q^(qN)
P(k) = (N choose k) 2^(-NH)

and P(k) -> 1 , so (N choose k) -> 2^NH

(that is, it's a delta function around NH, with
value 2^NH ; all the 'support' is on the typical

Charles Bloom / cb at my domain

Send Me Email

Back to the News Index

The free web counter says you are visitor number