The relation of info-theory and the 2nd law seems to provide a cute way to understand the general incompressibility of data. (none of this is really new; most of it is well known to Bennet,Chaitin, etc. ; I present a slightly different viewpoint : I show that various info-theoretic results follow from physics, instead of the usual approach which does the opposite) First, some simple marks on this relation are needed. The nicest way to see this relation is simply by abstracting a physical process. Consider N copies of a two-state physical system - such as an array of spins, or gas molecules which are either on the left or the right - we abstract these to be bits. Now, if all N are in a pure state, then their entropy is minimized. It is well known from the second law that as we let them randomize, work can be extracted from them. (eg. the molecules all to the left or right will randomize to be a mix of left and right, which can do work for us if we put a piston in the chamber; similarly, an array of spins produces a B field, which we can extract work from by sliding another spin along (and the array will become randomized). The physical correspondence is that randomizing a "bit" frees kTln2 of energy. (similarly, erasing a random bit requires kTln2 of energy; this can be nicely understood with the Maxwell- Slizard demon). (T is the temperature of the isothermal bath). So, for a binary order-0 string of bits, we have a nice relation between the Shannon Entropy (H) and the amount of energy (E) which must be used to erase the string : E(erase) = kTln2 H Thus the energy liberatable from a given string (of length N) is E(lib) = kTln2 ( N - H ) And C=(N-H) is nothing but the compressibility (C). This means that we can measure the compressibility by counting the energy exchanged with the bath : if we give the bath energy, then we must have compressed information at some point (gathered redundancy). It is also important to note that any reversible information-theoretic or computational operations have no net energy cost or liberation. (this is simple thermo and of course must be true or you would have a perpetual motion machine). I go into more rigour in an addendum, but we can see this form must be true. The statistical mechanical probability of occupying a certain state is : P = e^{ - E/kT } = e^{ - kTln2H / kT } = 2^{ - H } Which is exactly the Shannon probability of randomly picking a string of entropy H ! "Bloom's Demon:" I describe a reversible process which would be a perpetual motion machine if not for the info-theory of incompressibility. Consider an array of 2^N , N bit strings. The array is immersed in a bath at temp T and allowed to fully randomize (H=N). It is then removed to adiabatic isolation. Now, the vast majority of strings will be truly random, but some strings from a random source are quite ordered - 0000, 111, 010101, will all occur (on average, with 2^N strings, each possible string occurs once). Now, an active demon is also in our 'system'. He moves down the list of strings, looking at each one. If he examines a string, he can tell if it is compressibile or not (he has access to a perfect compressor, say); if it is incompressible he flags a 1 in his memory, other he leaves his memory at 0. If he desides it is compressible, he performs the reversible transform of turning the N ordered bits into H random bits and (N-H) zeros (for entropy H). Now, he has examined all 2^N strings, has made many new zeros in them, and has an array of flags in his memory. We put the system back into contact with the bath, and allow all the zeros we have made to equilibrate. As discussed above, this liberates an energy E = kT ln2 Sum{strings} C(string) where C = N - H(string) is the compressibility. Now that all the strings have equilibrated, they are back in the initial random condition. However, in order for this to be reversible, we must also restore the demon to its initial state. To do this we must clear 2^N bits, which takes in energy. The net compression attained is : C(net = (E/kT ln2) Sum{strings} [ C(strings) - 1 ] Which is strictly negative (we amortize C : it cannot be less than zero). (in fact, a very large negative). This is identical to the incompressibility of all strings, and the fact that if you compress one string you must expand another - in this context, if you can extract work from one string, you must have done work on other strings, or on your own mechanism. (there are some subtleties here; the Bloom Demon could use a reversible probability estimator so that it doesn't waste a full bit each time, since it would know that the "non compressible" case is much more likely (for example, first count the # of compressibile strings (M), then we need log(2^N choose M) bits to specify which ones were compressible; if we do this perfectly we could do as well as C(net) = 0); also these arguments do not carefully account for the fact that N is assumed to be known in advance, which provides a way to "sneak" some information) We can also unite this to my old viewpoint that the algorithmic info ("maximum compressibility" ala Kolmogorov) is obtained by finding an ideal basis to transform into, and then using the Shannon H. This is here true from thermodynamics because of the fact that reversible cycle does not liberate energy, and an (orthogonal) basis transform is reversible. Thus, we may find the optimal compressibility by allowing our system to make adiabatic reversible changes, and then letting the ordered part randomize, thus extracting work ( = compressibility). Some examples are in order: (examples shamelessly stolen from Bennett) : n copies of the same (random) string (of N bits). These may be reversibly transformed by xor-ing the ith string against the 1st one; when they are all the same, the result is that all the strings are cleared except the first. This is reversed simply by repeating the same operation again. Since O^2 = 1 (for the operation O), we see O cannot have any energy cost. So, after we have done this, we have (n-1) strings of N zero bits, so we can liberate energy or compressibility : E(lib) = kTln2 * C C = (n-1)*N A more radical example which shows the way E(lib) is related to Kolmogorov is the string 'pi' ; The mapping pi -> 0 -> pi is fully reversible, if we assume that there exists a "simple" program to compute pi (ie of finite size, and using secondary storage only in a reversible manner) (this is a reasonable assumption), then by applying this operation I can liberate all the kTln2*N energy from N bits of pi given to me ; the number of random bits left over after the reversible operation is simply the size of the proram for computing pi, which is the Kolmogorov complexity. Hence, I can find the relation: given a string S and a reversible mapping M, and an energy liberated E : E(lib)/kTln2 = max(M){ |M+M(S)| - H( M + M(S) ) } (where the "max" runs over all choices of M, and the H is just the Shannon entropy)
> Henry Baker wrote: --------------------------- >> E(erase) = kTln2 H > >This argument is circular, in the sense that this _is_ the definition >of T (temperature)! > >You have not supplied a definition of 'energy' in this context. Well, I have not been able to successfully get those files from netcom (the netcom ftp server is notoriously overcrowded). I will reply nonetheless. That is not the definition of T. Rather, T is the temperature of the external bath - the energy changing processes are assumed to occur isothermally (in a bath at T), while the constant-energy processes are done isolated from the bath (just like a Carnot engine). The 'Energy' is the amount of physical work extractable with some apparatus. This must be the same regardless of the apparatus, so I need specify no more. (though the most important point is that I never really used the temperature, anyway, just the fact that E and H are proportional) To clarify, I will demonstrate the derivation that I mentioned in my original post (the Maxwell-Slizzard demon); I can consider just this example since the results must be independent of the specific setup. A single atom (or packet, if you prefer) of gas is in a box of volume 2V . A "demon" is also in the box. At some time, the demon detects which side of the box the atom (packet) is on, and records it in his memory; he then pulls a membrane into the middle of the bax, splitting it into two regions of volume V (this is all done adiabatically, and moving a membrane about in a frictionless world uses no energy and is reversible). Now, the box is put in a bath at temperature T. The membrane is allowed to expand isothermally. This does work against the membrane (or some piston, or whatever) : dE = dW = - P dV = - (kT/V) dV where we have assumed the single atom is an ideal gas (eh?), Integration : E = kT ln ( Vf / Vi ) now Vf = 2V , Vi = V , so: E = kT ln 2 Now, we remove the membrane, and the system is exactly in the same state it started in : except the demon's memory still has the side-of-the-box bit flagged. This must be erased for this engine to be reversible, and the total energy change for the reversible process must be zero, hence it must cost E = kT ln 2 (in energy drawn from the bath) to clear the demon's 1 bit of memory. Note that the demon's memory when cleared is a random single bit, so it has entropy 1, so E = kT ln 2 * H We can see that this extends : if the demon had 2 bits to distinguish 4 partitions of the box, the entire argument would be the same, but Vf = 4 Vi , so E = kT ln 4 = 2 kT ln 2 = kT ln2 * H Again. Similarly, we can see that this must be H, and not the # of bits in the demon, since the demon could add an extra bit which is constant (entropy zero) without changing the energy cost of erasure. I don't want to get too distracted by the kTln2 in front of these. The ln2 is just a convention for the fact that information-theorists use log_base_2 while statistical mechanics use natural logs. The kT factor must certainly be there, because E = C T is generally true (for some C) in these situations, however in some other geometry I suppose this factor could be different, E = 3/2 kT ln2 H or whatever. The important thing is really just that E = C H for some constant C (where C is constant when T is, that is E = C H for an isothermal extraction of energY).
Surprisingly, all of these arguments apply to quantum information, and are in fact easier (!) because of the simple operator structure there. I will assume you are familiar with the density matrix and Von Neumman entropy (S) of quantum information theory (or, you can think of S as equal to H when the quantum state are classical ensembles). First of all, for a closed system, S is constant, because evolution is unitary which doesn't change the entropy (S). When the system is open (a subsystem of a larger closed system), S can only go up (a trivial calculation). This is the second law in quantum thermodynamics. Now, I need a "demon" scheme to try to extract energy from some quantum system. I will use my later schemes of examining a set of quantum strings to find the low-entropy ones, and allow them to thermally randomize and thus increase their entropy and extract work. We must show that the energy extractable goes like the entropy (more precisely, the compressibility). If this were not true, I could have a classical Maxwell Demon store its info in quantum bits, so if this equality were not true, I would have a perpetual motion machine. So, how do I extract energy from a qusystem? A simple model of spins. Each qubit is a spin up or down. I will extract energy making all the low-entropy spins point up, and then getting work as they randomize down. This work goes just like the number of spins up (the net mag field); there are various constants like the mag-moment and the kT factors, but these are all irrelevant; we will choose some units where they become 1. Now, what does this have to do with the entropy? It's quite simple. The best I can possibly do is to choose a basis (or apply a unitary transform = reversible) such that the most likely string is all up spins; the next most likely has one spin down, etc. This maximizes the average number of spins up. I need some considerations from classical info theory (not really info theory, just combinatorics). For strings of N qubits, there are 2^N total strings, but almost all of the strings are in some subspace of 2^NH strings, where H is the entropy of one qubit. Let me assume H is reasonably small (1/2), so that this subspace is small - so that I can rotate most of my strings to point mostly up. What is the the average number of spins up? It's simple counting. In the subspace of 2^NH strings, each one have a prob. of occurance very close to 2^(-NH), that is - they are roughly uniform. Let me first do it heuristically; here it is just like for classical info. I choose my N-qubit string. It has entropy NH on average; that is, it lives in a subspace of 2^NH ; the optimal transform to "up" is thus a map to NH bits of data (hence random), and the remainder up. Thus, N - NH = N(1-H) are up. Since N(1-H) = C , the compressibility , we have simply that the number up (=energy extractable) = C. Now, let me do it rigorously. We want to count the average number of ups in the maximal mapping. Each string in the 2^NH subspace occurs with equal probability, hence, I must simply count how many ways there are to map them into my up basis: 1 state can get all N spins up N states can get (N-1) spins up (N choose 2) can get (N-2) spins up .. Hence, the average number of spins up is average up = 2^(-NH) Sum[k] (N choose k) * (N - 2k) Note : the N-2k is there because a spin down counts as a negative; Also note that the sum[k] is defined from 0 to X where X is chosen such that Sum[k=0 to X] (N choose k) = 2^NH So, average up = N - 2*2^(-NH) Sum[k to X] (N choose k) * k Now we need to evaluate this. As N -> infinity, X simply becomes NH ; also (N choose k) is dominated by (NH choose k) (see below), so we have: average up = N - 2*2^(-NH) Sum[k to NH] (NH choose k) * k Now, this is a well-known formula in combinatorics; you may evaluate it easily by using the symmetry of the "choose" operation; Sum[k to M] (M choose k) * k = (M/2) * 2^M That is, it's just an average of k over all the choices. So, inserting, we find: average up = N - 2*2^(-NH) (NH/2)* 2^(NH) average up = N - NH = C Which is exactly the result we got from the heuristic treatment. This combinatoric analysis would have worked great for the classical case earlier, but it's hitting a nail with a sledge hammer; it works for the quantum case because the "typical states" and the can be a "typical subspace in Hilbert", and the mapping to minimum redundancy basis is a unitary transform to spins up. Now a note on the (N choose k) substitution we used above. It's easy to evaluate using a trick from info theory. The probability of getting a string with k zeros is : P(k) = (N choose k) p^(k) q^(N-k) In the probable subspace, k = pN, so: P(k) = (N choose k) p^(pN) q^(qN) P(k) = (N choose k) 2^(-NH) and P(k) -> 1 , so (N choose k) -> 2^NH (that is, it's a delta function around NH, with value 2^NH ; all the 'support' is on the typical subspace).
Charles Bloom / cb at my domain
The free web counter says you are visitor number