Charles Bloom, 11-16-97 see addenda at bottom addenda 1 and 2: 11-17-97 addenda 3 and 4: 12-23-97 ------------------------------- The work of B.W. Schumacher and C.H. Bennet on the question of complexity and the universe is intriguing for many reasons. One example is that the time-asymmetry of universal evolution might be characterized as a purely information-theoretic change in the universe. Bennet suggests that this would appear as non-trivial entanglement between distant systems : the information content of two systems A & B would be much greater than the sum of information context of the two systems separately (put in other terms, the joint system AB would be much more compressible than the the two isolated systems A & B). This is intriguing for quantum computation, and also for the prospect that biological conversion of heat to order may be characterized in this fashion. This has all been review, but let now try to extend these ideas with the ideas of compression. The reason we cannot use entropy to characterize complexity is obvious. First of all, entropy is ill-defined. As is well known in compression theory, the entropy of a system or a data set is only defined with respect to some assumed model. Typically, this model is taken as the order-0 probability model, but this fact is often not mentioned (see my web page for many notes on this). Second, the entropy of a purely random system is the maximal entropy, while our subject intuition tells us that this is not a good measure of "complexity". Also, the entropy of something like a perfect crystal, or anything at low temperature, is nearly zero, so this limit is also not a good measure of complexity. It might seem that complex systems are those whose entropy is intermediate, but of course a 50-50 mixture of gas and salt crystals has intermediate entropy, while it is merely the sum of two non-complex parts. Bennet in pioneering papers in the 80's works on a concept of "logical depth" as a measure of complexity. However, with a modern compression scientist's view, we can interpret this in a powerful way. It is well known in compression that many previously- random systems are actually quite predictable. For example, english text is now compressible to about 2 bits per character, and we suspect that may fall to as low as 1 bit per character (!!). Similarly, recent techniques have been about 70% successful at predicting the stock market, which is often sited by the naive as an example of a random system. However, some daunting problems remain in compression theory. I suggest that it is precisely these "difficult" systems which are biologically complex. The biggest problem in current compression theory is in modelling "finite-state" information. What we can do well is to model "finite-context" information. Finite-context information is all that which can predicted using a finite sequence of previous events. (Physicists often refer to the order-1-Markov finite context as a "Markov Process"; a compression theorist would be careful to call this an "order-1 markov process in a suitable basis"). Finite-context information can be thought of also as finding the order-0 entropy (the classical Shannon entropy) after performing some (finite) orthogonal basis transform to the optimal basis (e.g. transforming english text to word-tokens, or image data into Fourier components). Finite-state information, on the other hand, is simply the statement that after a certain event, a system is more likely to do something than before. Physically, this means that after absorbing a photon, an atom is more likely to emit a photon than it was before; however if the lifetime of the excited state is long and there are many other events in between, this cannot be modelled by finite-context information, rather it is the "state" of having a photon which must be modelled. In data, this corresponds to modelling the average length of words to predict a space, or the average lengths of sentences to predict a period, or on a higher-level, the fact that once I wrote "Bennet" I was likely to somewhere later write "logical depth". I suggest that it is this which defines complexity. That is, I define "complexity" by the difference between the Kolmogorov Complexity (the optimal compressability, or the absolute minimum entropy) and the optimal finite-context compressibility (which the current state of the art is very close to). This definition has some very appealing properties. For example, on purely random data, both the Kolmogorov and Finite-Context compressibility is zero, so that the complexity is zero. On the other hand, on perfectly predictable data like crystals, both the Kolmogorov compressibility and the Finite- Context compressibility are full (one, or infinity depending on your conventions), so the "complexity" is again zero. Now consider something more interesting, take N digits of the number Pi. The finite-context compressibility is very low - the digits seem nearly random, the output length is O(N). However, the Kolmogorov compressibility is very high - a finite program can exactly predict Pi, the output is O(1). The same is true for any output generated by a pseudo-random number generator, or encrypted text. This is also related to a famous problem in compression : if a data set D is compressed by a poor algorithm A to make A(D), the output is far from the entropy, but the output is also very close to random. Thus a better algorithm B would make a much smaller B(D), but B(A(D)) is typically larger than A(D)! However, the Kolmogorov complexity does not have this problem, it can use a compression C(x) = B(A^-1(x)) , so that C(A(D)) = B(D) = small. Thus, the "complexity" as defined here is a measure of to what extent a poor compressor has already acted on the data set, and this is closely related to the idea of biological action. A similar problem is the problem of compressing encrypted data; if A() is instead a simple xor-encrypter, B(A(D)) will be large, but C(A(D)) will un-encrypt and then compress, so the "complexity" is a measure of whether the data was simple but encrypted. The complexity also has the nice property that if we mix a random set and a pure set, again the finite context compressor does the best possible : as soon as it sees pure data it can predict it ideally, and it fails on random data, just as the Kolmogorov does, so that adding two non-"complex" systems cannot make a complex system. Finally, we exhibit that our "complexity" models entanglement. This is the same argument Bennet uses in "Complexity in the Universe" : take a data set x, with high complexity 'c', and a random string r. Now may a new string y = x XOR r. Now r alone is random (non-"complex"), y alone is random, and x alone has complexity c. However, r and y are entangled, so that 'ry' together also has complexity c, since if we have 'ry' we can un-xor by applying r to y, to get r + x, and then the complexities add. if C(x) = complexity of x , we see that (C(ab) - ( C(a) + C(b) )) is an excellent measure of the entanglement of the systems a & b. Note that since any single bit has complexity zero, we can apply this concept recursively on any data set, to see that the complexity is nothing more or less than a measure of the internal entanglement of a data set. ----------------------------------- addendum: Note that this work on complexity and entanglement is purely classical! The definition of complexity here (Kolmogorov minus finite-context entropy) is somehow related to the Shannon entropy of quantum superpositions. That is, in classical information the Shannon entropy is additive, but in quantum superpositions it can exhibit entanglement differences, as the "complexity" does here. Also note that this complexity can be used a gauge of to what extent a system evolves non-unitarily. If we consider the system's state to be labelled by a number in a finite range, then the time-evolution of that number generates a linear data file. If the system evolves unitarily, i.e. from a classical or quantum Hamiltonian, then this data file will be perfectly compressible by a finite context model (note that the context need not be 1 as is typically assumed in physics, as long as it is finite), thus the "complexity" will be zero. However, if the evolution is governed by a super-operator, then there is a state AB which has complexity zero, but each state A and B have non-zero complexity, because their evolution can exhibit finite-state patterns (a change in A at some time t can cause a change in B, which may not effect A again for an unbounded time; this is a finite-state statistical influence). We can create the same situation for classical data sets quite easily. Take a finite context machine and generate a data set D. Take every even byte and put it in D1, take ever odd byte and put it in D2. Then D = D1+D2 has "complexity" zero, but D1 and D2 don't. While I'm on the subject, it seems also that some well-known results in compression apply nicely to physics. For example, I have shown in an article on the Kraft inequality that if you compress one possible outcome by a factor F, then you must expand another by a factor greater than F (equal to F only when F = 0). This is very close to Physics' second law - if I take entropy out of one bucket, I must put it somewhere else so that the total increases. Also, it is well known (and trivial) that the space of all high-entropy data is much (infinitely) larger than the space of all low-entropy data. This seems to have profound implications for the final state of the universe: if there are near-random effects that can cause jumps of the "universe wavefunction" from one state to another, then this random walk will always end up in a maximal-entropy state after a time long compared to the jump frequency. This would constitute a simple proof of the arrow of time, assuming the initial condition that the universe was in a low-entropy state. (a black hole big bang model is indeed a very low entropy state). ------------------------------------------------------------------------- addendum 2: It is interesting to note that this applies very nicely to distinguishing "life" from death. A biological entity with DNA will be easily packed by the Kolmogorov compressor : it can "grow" the being from just the packed dna and some details about how it turned out. The finite-context packer must actually pack the location of every atom. Also note that the "complexity" can tell how good your physics is :^> To make this most clear, I define a new measure, called the "value" (closely related to the information-theoretic "mutual information"): let D = a canonical data set (e.g. the universe, all the books in the world, etc.) let T = the data/system to test let C = a canonical compressor (e.g. kolmogorov complexity) V = C(D) - C(D|T) = C(D) + C(T) - C(DT) (Seth Lloyd uses the value as a measure of how much D & T are cause-effect correlated; here we let D -> infinity and C -> perfect, so we want to know how "good" T is). Thus, we find that the Value of Goldstein and Jackson's textbooks is very high, because knowing a little physics, we can make great predictions about the world. If there is a perfectly deterministic final theory, then its "value" is quite immense (given the initial conditions of the universe, all of D is exactly fixed). In fact, quantum mechanics would appear to put a limit of the maximum "value" of any system (Lloyd interprets this as the fact that quantum correlations (e.g. EPR "paradox") prevent a perfect local causal correlation between data sets from ever occuring - that is, given a system, it is impossible to know whether or not that system is "closed", in the sense that all the information about the system predicts everything about its future). Shumacker (I believe) once pointed out that a bottle of gas requires more information to reproduce than all the libraries in the world. The "value" eradicates this problem : a bottle of gas has almost no value, while libraries certainly have immense value. Similarly, the value becomes an exact measure of the quality of your publications! The paper with the higher value is more predictive about the universe - strictly! It is also interesting to consider this in the context of errors. A Maxwellian demon, or the surface of a black hole, are not perfect data storage devices and so much acquire errors in their information. These errors cause a superficial increase in entropy (information content), though it is intuitively clear that this extra information is false - it actually "hurts". For example, with Maxwell-Slizard- Shumaker's demon, we find that the work that can be extracted by the demon goes down if his recording device acquires errors. (one way to see this is that he must spend more energy to erase the errors, or that if his information about where the particle is is wrong, then the particle will do work against the demon's screen). The value (and also the "complexity" of above) are thus a useful measure, because they do indeed go down when errors occur. Thus, the cost to erase the demon's tape is the entropy or information on the tape, and the net work attainable from the tape is the "value" of the information, correlated to the physical system. ---------------------------------------------------------------- addendum 3: This leads to another note on life and complexity : we can tell the difference between 'animate' and 'inanimate' because biological systems have "complex" self-similar "value". I'm using both my 'complexity' and 'value' measures here, so I'd best be more explicit. First, by 'self-similar value' I mean a sort of fractal self-prediction. That is, if an organism is composed of n parts Pn, then the self-similar value is: self-similar-value (ssv) = Sum[ on all ways to partition the parts Pn ] { Sum[ ways to put some parts into a set A, remainder into B ] { Value( A at predicting B ) } } Where the value is, as above V(A,B) = H(A) + H(B) - H(AB) . (the above formula should also be normalized by 1/# of parts & 1/# of ways). This is very reminiscent of fractal self-similarity. (note that I put all the sums into the ssv just to make it invariant under the choices of partitioning; they are not really important). Now, even a rock has reasonable self-similar value (that is, one part of the rock looks very much like another part, so that knowing what the rock looks like at A helps us to guess what it looks like at B). However, the 'complexity' of the self-similar value (ssv) of the rock is very low. Here the complexity is defined to be difference in the ssv as computed with the kolmogorov/chaitan compressor vs. an optimal finite context encoder. Thus we find that the complexity of the ssv is an excellent measure of to what extent a system is self-regulating or self-organizing. Perhaps an even better measure is the asymmetry of the the ssv. That is, we can say the "DNA is small compared to a person", or "a book is small compared to the universe". So, let us compute. put x = a "state" : a choice of partitions Pn put y(x) = a "grouping" : a way to put some elements of x into A, and the rest into B the |x| = # of states, |y| = # of groupings. normalize the value of a grouping as: p(y) = V(A,B) / { Sum[y] V(A,B) } That is, the value of A to predict B over the total value of the ways to choose groupings, so that Sum[y]p(y) = 1 and p(y) >= 0 . Thus p(y) is a probability. Put: H(x) = (1/|y|) Sum[y] p(y) log(1/ p(y) ) Is the entropy of the normalized 'values' for the given partitioning. Then { H_max = largest H(x) for any x } is just a number. So, if all of the values are equal, H_max = 1 . However, if there exists a "kernel" of value (a book, or a genome), then p(y) will be strongly peaked around the grouping where A = the genome, and B = the body. This will make H much smaller in this case. Thus we can put : I = (1 - H_max) 0 <= I <= 1 if I = 0, the "information" is evenly spread. If I = 1, then there is a single isolated kernel which predicts the rest of the system. Somewhere between is life. ------------------------------------------------------------- addendum 4: Answers to questions from Jack Sarfatti : >>the >>information content of two systems A & B would be >>much greater than the sum of information context of >>the two systems separately (put in other terms, the >>joint system AB would be much more compressible than >>the the two isolated systems A & B). > >I am trying to understand this. If systems A and B become entangled to a pure state, >then the total entropy defined as Trace[rho(AB) ln rho(AB)] is zero. >The partial local entropies of A and B separately from their reduced density >matrices and rho(b) rho(A) are positive, therefore the purely nonlocal entropy >of entanglement is negative. How do you mean "compressibility" and "information" >in terms of rho? Let's take the example of qubits : rho(A) = rho(B) = 1/2 ( where 1 is a 2x2 matrix), rho(AB) = 1/2 [ (0,A)(0,B) + (1,A)(1,B)] the perfectly entangled joint density matrix. Then (see later for better notation & review) H = - Trace[rho l2 rho ] (I use the information-theoretic normalization, to make H positive and in a basis of bits by using the log base 2). then H(A) = H(B) = l2(2) = 1 and H(AB) = 1 as well Thus H(A:B) = H(A)+H(B) - H(AB) = 1 knowing B gives us one bit of information about A. I'll take this opportunity to clear up the terminology. "information" is synonymous with "entropy" , though they are conceptually different, and I have suggested that my "complexity" is a better measure of the "real" information content. "compressability" = ( 1 - (H/per bit) ) * 100% is just the inverse, another colloquialism, so that if the data is random, H = 1 per bit, so that the compressability is zero, or if the data is deterministic (always the same) H = 0, so the compressability is 100% >>This is intriguing for quantum computation, and also for the prospect that >>biological conversion of heat to order may be characterized in this fashion. > >More details please. Well, let me just talk about the biological part for now. This question is basically answered by my later notes, but let me elaborate on some points. We can use the 'complexity' to distinguish between the ordinary evolution of statistical systems to higher entropy and the forced evolution of biological systems. That is, normal evolution results in an increase of entropy, with no change of 'complexity', which biological conversion of heat to order results in both an increase of entropy and complexity. In my addendum #3 I have also laid out a measure of the localization of self-similar value. This is an even better quantity at measuring to what extent order has been artificially created. >> [ discussion of finite context and finite state predictions ] > >All of this uses Boolean logic. Can you continue to think this way in >quantum logic (non-Boolean lattice of propositions) which incorporates >the quantum superposition principle? I mean what happens when you have >a coherent superposition of strings? Well, we certainly can do so in a "coherent history". When we work on a quantum data source, we must work on each element of the superposition independently, or treat them as orthogonal vectors in a higher dimensional space, in which case these arguments follow as usual. That is, if x is an alphabetic character, we can develop some predictive structure on x, P(x). Now if y is some other character source, we can also use P(y). A quantum superposition (x a +y b) then is predicted not as P(x+y) but as P(x) a +P(y) b , where a and b are some orthogonal unit vectors in the Hilbert space (this is a confusing notation for quantum-mechanics; x and y represent just their numerical value, not their quantum state; a and b represent their associated direction in hilbert space, the pair xa is a Dirac Ket). >>take a data set x, with high complexity 'c', and a random >>string r. Now may a new string y = x XOR r. Now r >>alone is random (non-"complex"), y alone is random, and >>x alone has complexity c. However, r and y are entangled, > >How are they "entangled" in the quantum sense of >Einstein-Podolsky-Rosen in this case? >I do not understand how your use of "entanglement" above >corresponds to the standard quantum definition? Well, of course they aren't, as you noticed later when I made it explicit that we are working with a classical theory of entanglement here. It is interesting to consider the similarities. We can define the quantum entanglement by E(A,B) = H(A) + H(B) - H(AB) where H = Trace[ - rho log2 rho ] is the quantum entropy (the Shannon entropy, order-0 entropy, but with superpositions) The classical entanglement is then a sort of measure of the non-commutativity of the Trace and the log, E(A,B) = - TrA( trB(r) log[ trB(r) ] ) - TrB( trA(r) log[ trA(r) ] ) + TrA TrB( r log[ r ] ) where r = rhoAB is the joint density matrix and TrA means the trace over system A. The classical entanglement is : E(A,B) = C(A) + C(B) - C(AB) where C is the complexity = H_Kolmogorov - H_FCM This is mathematically very similar to the quantum entanglement above. Note that if we used the Shannon entropy H instead the C in the classical entanglement, then we would have E = 0 for all A and B. ---------------- Rather than answer the various questions on my comments about non-unitary evolution, I'll instead simply elaborate on that application of the 'complexity'. We consider a pair of systems, A and B, where we might say that A is under study and B is the rest of the universe. Then the joint system AB = C will evolve unitarily. The system A, however, will evolve non-unitarily, eg. under a super-operator or Lindbladian. (I'm not familiar with Prigogine's work, but I imagine it is the same thing). Now, the system C evolves unitary, which means that the state of the system at time n = S(n) , a complex number (or a vector in a hilbert space), will exactly determine the state S(n+1). Now, the sequence of "numbers" S is thus exactly the type of data that a finite-context (FCM) compressor will pack. That is, a finite-context compressor looks at the previous N entries and tries to guess the next. But we have just said that knowing S(n) you can perfectly guess S(n+1), so that the FCM will guess correctly every time. Now, the Kolmogorov complexity can do no better than perfect, so that H_Kolmogorov = H_FCM, so that Complexity = H_K - H_FCM = 0 , so a unitary system evolves with zero complexity. Now, back to our model where C = AB is the universe, which is unitary, so has zero complexity of evolution. The part A evolves non-unitarily. As a simple example, let us consider the simple case where the universe consists of a single marble. That marble has two states : in A, or doing something in B (the states in B are such that AB is unitary, but there are many & complicated states in B). Thus the sequence of states S(n).. in A are a series of bits 0 and 1 for whether or not the marble is in A (or a series of complex coefficients in front the kets |0> and |1> , in which case each coefficient evolves independently in the way I describe). Now, these bits may appear nearly random in A, with the marble flying in and out, so that the H_FCM is roughly 1 per bit. However, the Kolmogorov compressor can work optimally be enumerating the states in B and the initial condition of the marble. (note that it can find the states of B simply by trying all possibilities, which is what it does). This is a finite amount of information, while H_FCM scales with the time elapsed. Hence, the complexity of A's evolution = H_FCM - H_K can be made arbitrarily large. This is quite general : the Kolmogorov compressor can always output at worst finitely much data by simply describing the joint system AB and the initial condition, and then calculating the S(n) of A by manually doing the evolution. On the other hand, the Finite-Context compressor can only act on the actual states S(n). The result is that C=AB has zero complexity, but the subsection A can have evolve with arbitrarily large complexity. Hence the complexity of a system's evolution measures whether or not it is truly isolated ("closed"). --------- >Do you mean, the more complex the system is, for example, >in the sense of the biology of open systems far from the >thermodynamic branch, the more non-unitary its evolution must be? This is actually a conclusion I had not yet arrived at, but you are correct. In order to create a "complex" system, such as life (as I have categorized in my addenda) its evolution must have been at some time complex. This because the state of a system is nothing but the result of its evolution, and so the complexity of the state must be <= to the complexity of the evolution. Hence life can only be created by extremely non-unitary evolution! ---------- I think we just had a miscommunication because I was using sloppy notation. Let me be more precise. |0,A> is a Ket for a state of qubit A (0,A) = |0,A><0,A| is the corresponding projector hence, 1_A = the unit 2x2 matrix in basis A = (0,A) + (1,A) So, returning to my example : rho(AB) = 1/2 [ (0,A)(0,B) + (1,A)(1,B)] rho(A) = rho(B) = 1/2 ( where 1 is a 2x2 matrix) or rho(A) = 1/2 [ (0,A) + (1,A) ] so that the joint system AB is not any pure state; it can be interpretted as an ensemble : either A and B are both 0 or both 1 . H = - Trace[rho l2 rho ] >I do not understand the above calculation. Given that the perfectly entangled pair state > > |A,B> = (1/sqrt2)[|0,A>|0,B> + |1,A>|1,B>] > >Then the total Shannon entropy is 0 This is all correct, we're just talking about different states for AB, which lead to the same local density-matrices A and B. >The reduced density matrices are >rho(A) = rho(B) = 1 >For example, these are locally completely unpolarized photons. Right, we have the same reduced density matrices. >So define the nonlocal quantum entropy of entanglement H(A|B) by > >H(A,B) = H(A) + H(B) + H(A|B) = 1 + 1 + H(A|B) = 0 > >H(A|B) = -2 That's fine; and in my case H(A|B) = -1 My H(A:B) = - H(A|B) , since I like my entropies to be positive ! However, I find it a little strange to talk of these quantum entropies as containing information. In classical entropies, the larger the entropy, the greater the information content. However, a qubit for entropy one contains no information at all - that is, measurement results are random; at the same time, a qubit for entropy zero has 1 bit of information, since it is in a pure state and we can measure that state along some axis to get an explicity 0 or 1 classical bit. This is why I define my H(A:B) to have the opposite sign of your H(A|B) , so that H(A:B) represents the positive amount of information which can be gotten from AB together that cannot be gotten from A & B alone. >From this we see that nonlocal quantum connection, in orthodox theory, are reservoirs of >information, which, because of Eberhard's theorem, cannot be locally decoded to send >practical messages faster than light or to receive them before they are sent. This is a beautiful truth about quantum theory. I am pursuing this along two lines. First, it seems that quantum teleportation might actually send more than one bit of information per qubit; this is till under debate. See http://www.cbloom.com/news/moreinfo.html Second, the quantum theory seems to very delicately balance the fact that there can be non-local correlations, but these can only be detected by local measurements; I'm trying to see if this can be used as the defining postulate for quantum theory, or if there are any other theories which have this property (no "hidden variable" theory has this property, for example).

Charles Bloom / cb at my domain

Send Me Email

The free web counter says you are visitor number