Complexity and Entanglement

Charles Bloom, 11-16-97
see addenda at bottom
addenda 1 and 2: 11-17-97
addenda 3 and 4: 12-23-97

The work of B.W. Schumacher and C.H. Bennet on the question of
complexity and the universe is intriguing for many
reasons.  One example is that the time-asymmetry of
universal evolution might be characterized as a purely
information-theoretic change in the universe.
Bennet suggests that this would appear as non-trivial
entanglement between distant systems : the
information content of two systems A & B would be
much greater than the sum of information context of
the two systems separately (put in other terms, the
joint system AB would be much more compressible than
the the two isolated systems A & B).  This is intriguing
for quantum computation, and also for the prospect that
biological conversion of heat to order may be
characterized in this fashion.

This has all been review, but let now try to extend
these ideas with the ideas of compression.  The reason
we cannot use entropy to characterize complexity is
obvious.  First of all, entropy is ill-defined.  As is
well known in compression theory, the entropy of a
system or a data set is only defined with respect to
some assumed model.  Typically, this model is taken
as the order-0 probability model, but this fact is often
not mentioned (see my web page for many notes on this).
Second, the entropy of a purely random system is the
maximal entropy, while our subject intuition tells us
that this is not a good measure of "complexity".  Also,
the entropy of something like a perfect crystal, or
anything at low temperature, is nearly zero, so this
limit is also not a good measure of complexity.  It might
seem that complex systems are those whose entropy is
intermediate, but of course a 50-50 mixture of gas and
salt crystals has intermediate entropy, while it is
merely the sum of two non-complex parts.

Bennet in pioneering papers in the 80's works on a
concept of "logical depth" as a measure of complexity.
However, with a modern compression scientist's view,
we can interpret this in a powerful way.

It is well known in compression that many previously-
random systems are actually quite predictable.  For
example, english text is now compressible to about 2
bits per character, and we suspect that may fall to
as low as 1 bit per character (!!).  Similarly, recent
techniques have been about 70% successful at predicting
the stock market, which is often sited by the naive as
an example of a random system.  However, some daunting
problems remain in compression theory.  I suggest that
it is precisely these "difficult" systems which are
biologically complex.

The biggest problem in current compression theory is
in modelling "finite-state" information.  What we can
do well is to model "finite-context" information.
Finite-context information is all that which can
predicted using a finite sequence of previous events.
(Physicists often refer to the order-1-Markov finite
context as a "Markov Process"; a compression theorist
would be careful to call this an "order-1 markov
process in a suitable basis").  Finite-context information
can be thought of also as finding the order-0 entropy
(the classical Shannon entropy) after performing some
(finite) orthogonal basis transform to the optimal basis
(e.g. transforming english text to word-tokens, or image
data into Fourier components).  Finite-state information,
on the other hand, is simply the statement that after
a certain event, a system is more likely to do something
than before.  Physically, this means that after
absorbing a photon, an atom is more likely to emit a
photon than it was before; however if the lifetime of
the excited state is long and there are many other events
in between, this cannot be modelled by finite-context
information, rather it is the "state" of having a photon
which must be modelled.  In data, this corresponds to
modelling the average length of words to predict a space,
or the average lengths of sentences to predict a period,
or on a higher-level, the fact that once I wrote "Bennet"
I was likely to somewhere later write "logical depth".

I suggest that it is this which defines complexity.
That is, I define "complexity" by the difference between
the Kolmogorov Complexity (the optimal compressability,
or the absolute minimum entropy) and the optimal
finite-context compressibility (which the current state of
the art is very close to).  This definition has some
very appealing properties.  For example, on purely
random data, both the Kolmogorov and Finite-Context
compressibility is zero, so that the complexity is zero.
On the other hand, on perfectly predictable data like
crystals, both the Kolmogorov compressibility and the Finite-
Context compressibility are full (one, or infinity depending
on your conventions), so the "complexity" is again zero.

Now consider something more interesting, take N digits of
the number Pi.  The finite-context compressibility is very
low - the digits seem nearly random, the output length is O(N).
However, the Kolmogorov compressibility is very high - a
finite program can exactly predict Pi, the output is O(1).
The same is true for any output generated by a pseudo-random
number generator, or encrypted text.

This is also related to a famous problem in compression : if
a data set D is compressed by a poor algorithm A to make A(D),
the output is far from the entropy, but the output is also
very close to random.  Thus a better algorithm B would
make a much smaller B(D), but B(A(D)) is typically larger
than A(D)!  However, the Kolmogorov complexity does not have
this problem, it can use a compression C(x) = B(A^-1(x)) ,
so that C(A(D)) = B(D) = small.  Thus, the "complexity" as
defined here is a measure of to what extent a poor compressor
has already acted on the data set, and this is closely
related to the idea of biological action.  A similar problem
is the problem of compressing encrypted data; if A() is
instead a simple xor-encrypter, B(A(D)) will be large, but
C(A(D)) will un-encrypt and then compress, so the "complexity"
is a measure of whether the data was simple but encrypted.

The complexity also has the nice property that if we mix
a random set and a pure set, again the finite context compressor
does the best possible : as soon as it sees pure data it can
predict it ideally, and it fails on random data, just as the
Kolmogorov does, so that adding two non-"complex" systems cannot
make a complex system.  Finally, we exhibit that our "complexity"
models entanglement.  This is the same argument Bennet uses in
"Complexity in the Universe" :

take a data set x, with high complexity 'c', and a random
string r.  Now may a new string y = x XOR r.  Now r
alone is random (non-"complex"), y alone is random, and
x alone has complexity c.  However, r and y are entangled,
so that 'ry' together also has complexity c, since if
we have 'ry' we can un-xor by applying r to y, to get
r + x, and then the complexities add.

if C(x) = complexity of x , we see that (C(ab) - ( C(a) + C(b) ))
is an excellent measure of the entanglement of the systems a & b.
Note that since any single bit has complexity zero, we can
apply this concept recursively on any data set, to see that the
complexity is nothing more or less than a measure of the internal
entanglement of a data set.


Note that this work on complexity and entanglement is purely
classical!  The definition of complexity here (Kolmogorov
minus finite-context entropy) is somehow related to the
Shannon entropy of quantum superpositions.  That is, in
classical information the Shannon entropy is additive, but
in quantum superpositions it can exhibit entanglement 
differences, as the "complexity" does here.

Also note that this complexity can be used a gauge of to
what extent a system evolves non-unitarily.  If we consider
the system's state to be labelled by a number in a finite
range, then the time-evolution of that number generates a
linear data file.  If the system evolves unitarily, i.e.
from a classical or quantum Hamiltonian, then this data file
will be perfectly compressible by a finite context model
(note that the context need not be 1 as is typically assumed
in physics, as long as it is finite), thus the "complexity"
will be zero.  However, if the evolution is governed by a
super-operator, then there is a state AB which has complexity
zero, but each state A and B have non-zero complexity, because
their evolution can exhibit finite-state patterns (a change
in A at some time t can cause a change in B, which may not
effect A again for an unbounded time; this is a finite-state
statistical influence).  We can create the same situation for
classical data sets quite easily.  Take a finite context machine
and generate a data set D.  Take every even byte and put it
in D1, take ever odd byte and put it in D2.  Then D = D1+D2 has
"complexity" zero, but D1 and D2 don't.

While I'm on the subject, it seems also that some well-known
results in compression apply nicely to physics.  For example,
I have shown in an article on the Kraft inequality
that if you compress one possible outcome by a factor F, then you
must expand another by a factor greater than F (equal to F only when F = 0).
This is very close to Physics' second law - if I take entropy out of one
bucket, I must put it somewhere else so that the total increases.

Also, it is well known (and trivial) that the space of all high-entropy
data is much (infinitely) larger than the space of all low-entropy data.
This seems to have profound implications for the final state of the universe:
if there are near-random effects that can cause jumps of the "universe
wavefunction" from one state to another, then this random walk will always
end up in a maximal-entropy state after a time long compared to the jump
frequency.  This would constitute a simple proof of the arrow of time,
assuming the initial condition that the universe was in a low-entropy state.
(a black hole big bang model is indeed a very low entropy state).

addendum 2:

It is interesting to note that this applies very nicely to distinguishing
"life" from death.  A biological entity with DNA will be easily packed by
the Kolmogorov compressor : it can "grow" the being from just the packed
dna and some details about how it turned out.  The finite-context packer
must actually pack the location of every atom.

Also note that the "complexity" can tell how good your physics is :^>
To make this most clear, I define a new measure, called the "value"
(closely related to the information-theoretic "mutual information"):

let D = a canonical data set (e.g. the universe, all the books in the world, etc.)
let T = the data/system to test
let C = a canonical compressor (e.g. kolmogorov complexity)

V = C(D) - C(D|T) = C(D) + C(T) - C(DT)

(Seth Lloyd uses the value as a measure of how much D & T are cause-effect
correlated; here we let D -> infinity and C -> perfect, so we want to know
how "good" T is).

Thus, we find that the Value of Goldstein and Jackson's textbooks is very
high, because knowing a little physics, we can make great predictions about
the world.  If there is a perfectly deterministic final theory, then its
"value" is quite immense (given the initial conditions of the universe, all
of D is exactly fixed).  In fact, quantum mechanics would appear to put a
limit of the maximum "value" of any system (Lloyd interprets this as the
fact that quantum correlations (e.g. EPR "paradox") prevent a perfect local
causal correlation between data sets from ever occuring - that is, given a
system, it is impossible to know whether or not that system is "closed", in
the sense that all the information about the system predicts everything about
its future).

Shumacker (I believe) once pointed out that a bottle of gas requires more
information to reproduce than all the libraries in the world.  The "value"
eradicates this problem : a bottle of gas has almost no value, while
libraries certainly have immense value.

Similarly, the value becomes an exact measure of the quality of your
publications!  The paper with the higher value is more predictive about
the universe - strictly!

It is also interesting to consider this in the context of errors.  A
Maxwellian demon, or the surface of a black hole, are not perfect data
storage devices and so much acquire errors in their information.
These errors cause a superficial increase in entropy (information
content), though it is intuitively clear that this extra information
is false - it actually "hurts".  For example, with Maxwell-Slizard-
Shumaker's demon, we find that the work that can be extracted by the
demon goes down if his recording device acquires errors. (one way to
see this is that he must spend more energy to erase the errors, or
that if his information about where the particle is is wrong, then
the particle will do work against the demon's screen).  The value
(and also the "complexity" of above) are thus a useful measure,
because they do indeed go down when errors occur.  Thus, the cost
to erase the demon's tape is the entropy or information on the tape,
and the net work attainable from the tape is the "value" of the
information, correlated to the physical system.

addendum 3:

This leads to another note on life and complexity : we can tell
the difference between 'animate' and 'inanimate' because
biological systems have "complex" self-similar "value".  I'm using
both my 'complexity' and 'value' measures here, so I'd best be
more explicit.  First, by 'self-similar value' I mean a sort of
fractal self-prediction.  That is, if an organism is composed of
n parts Pn, then the self-similar value is:

self-similar-value (ssv) = 
	Sum[ on all ways to partition the parts Pn ] {
		Sum[ ways to put some parts into a set A, remainder into B ] {
			Value( A at predicting B )

Where the value is, as above V(A,B) = H(A) + H(B) - H(AB) .  (the
above formula should also be normalized by 1/# of parts & 1/# of ways).
This is very reminiscent of fractal self-similarity.  (note that I
put all the sums into the ssv just to make it invariant under the
choices of partitioning; they are not really important).
Now, even a rock has reasonable self-similar value (that is, one part
of the rock looks very much like another part, so that knowing what
the rock looks like at A helps us to guess what it looks like at B).
However, the 'complexity' of the self-similar value (ssv) of the rock is
very low.  Here the complexity is defined to be difference in the ssv as
computed with the kolmogorov/chaitan compressor vs. an optimal finite
context encoder.  Thus we find that the complexity of the ssv is an
excellent measure of to what extent a system is self-regulating or

Perhaps an even better measure is the asymmetry of the the ssv.  That
is, we can say the "DNA is small compared to a person", or "a book is
small compared to the universe".  So, let us compute.

put x = a "state" : a choice of partitions Pn
put y(x) = a "grouping" : a way to put some elements of x into A, and the rest into B
the |x| = # of states, |y| = # of groupings.

normalize the value of a grouping as:

p(y) = V(A,B) / { Sum[y] V(A,B) }

That is, the value of A to predict B over the total value of the ways to choose
groupings, so that Sum[y]p(y) = 1 and p(y) >= 0 .  Thus p(y) is a probability.

H(x) = (1/|y|) Sum[y] p(y) log(1/ p(y) )

Is the entropy of the normalized 'values' for the given partitioning.

Then { H_max = largest H(x) for any x } is just a number.

So, if all of the values are equal, H_max = 1 .  However, if there exists
a "kernel" of value (a book, or a genome), then p(y) will be strongly
peaked around the grouping where A = the genome, and B = the body.  This
will make H much smaller in this case.  Thus we can put :

I = (1 - H_max)

0 <= I <= 1

if I = 0, the "information" is evenly spread.  If I = 1, then there is
a single isolated kernel which predicts the rest of the system.  Somewhere
between is life.

addendum 4:

Answers to questions from Jack Sarfatti :

>>information content of two systems A & B would be
>>much greater than the sum of information context of
>>the two systems separately (put in other terms, the
>>joint system AB would be much more compressible than
>>the the two isolated systems A & B).
>I am trying to understand this. If systems A and B become entangled to a pure state, 
>then the total entropy defined as Trace[rho(AB) ln rho(AB)] is zero.
>The partial local entropies of A and B separately from their reduced density
>matrices and rho(b)  rho(A) are positive, therefore the purely nonlocal entropy
>of entanglement is negative. How do you mean "compressibility" and "information"
>in terms of rho? 

Let's take the example of qubits :
rho(A) = rho(B) = 1/2 ( where 1 is a 2x2 matrix),
rho(AB) = 1/2 [ (0,A)(0,B) + (1,A)(1,B)]
the perfectly entangled joint density matrix.  Then

(see later for better notation & review)

H = - Trace[rho l2 rho ]

(I use the information-theoretic normalization, to make H positive and
in a basis of bits by using the log base 2).

then H(A) = H(B) = l2(2) = 1
and H(AB) = 1 as well

Thus H(A:B) = H(A)+H(B) - H(AB) = 1

knowing B gives us one bit of information about A.  

I'll take this opportunity to clear up the terminology.
"information" is synonymous with "entropy" , though they are conceptually
different, and I have suggested that my "complexity" is a better measure
of the "real" information content.

"compressability" = ( 1 - (H/per bit) ) * 100%
is just the inverse, another colloquialism, so that if the data is
random, H = 1 per bit, so that the compressability is zero, or if
the data is deterministic (always the same) H = 0, so the compressability
is 100%

>>This is intriguing for quantum computation, and also for the prospect that
>>biological conversion of heat to order may be characterized in this fashion.
>More details please. 

Well, let me just talk about the biological part for now.  This question
is basically answered by my later notes, but let me elaborate on some
points.  We can use the 'complexity' to distinguish between the ordinary
evolution of statistical systems to higher entropy and the forced evolution
of biological systems.  That is, normal evolution results in an increase
of entropy, with no change of 'complexity', which biological conversion
of heat to order results in both an increase of entropy and complexity.
In my addendum #3 I have also laid out a measure of the localization of
self-similar value.  This is an even better quantity at measuring to
what extent order has been artificially created.

>> [ discussion of finite context and finite state predictions ]
>All of this uses Boolean logic. Can you continue to think this way in
>quantum logic (non-Boolean lattice of propositions) which incorporates
>the quantum superposition principle? I mean what happens when you have
>a coherent superposition of strings? 

Well, we certainly can do so in a "coherent history".  When we work
on a quantum data source, we must work on each element of the
superposition independently, or treat them as orthogonal vectors in
a higher dimensional space, in which case these arguments follow as
usual.  That is, if x is an alphabetic character, we can develop
some predictive structure on x, P(x).  Now if y is some other character
source, we can also use P(y).  A quantum superposition (x a +y b) then is
predicted not as P(x+y) but as P(x) a +P(y) b , where a and b are some
orthogonal unit vectors in the Hilbert space (this is a confusing
notation for quantum-mechanics; x and y represent just their numerical
value, not their quantum state; a and b represent their associated
direction in hilbert space, the pair xa is a Dirac Ket).

>>take a data set x, with high complexity 'c', and a random
>>string r. Now may a new string y = x XOR r. Now r
>>alone is random (non-"complex"), y alone is random, and
>>x alone has complexity c. However, r and y are entangled,
>How are they "entangled" in the quantum sense of
>Einstein-Podolsky-Rosen in this case? 
>I do not understand how your use of "entanglement" above 
>corresponds to the standard quantum definition?

Well, of course they aren't, as you noticed later when I made it
explicit that we are working with a classical theory of entanglement
here.  It is interesting to consider the similarities.  We can
define the quantum entanglement by

E(A,B) = H(A) + H(B) - H(AB)

where H = Trace[ - rho log2 rho ] is the quantum entropy
(the Shannon entropy, order-0 entropy, but with
The classical entanglement is then a sort of measure
of the non-commutativity of the Trace and the log,

E(A,B) = - TrA( trB(r) log[ trB(r) ] ) - TrB( trA(r) log[ trA(r) ] )
		+ TrA TrB( r log[ r ] )

where r = rhoAB is the joint density matrix and TrA means the
trace over system A.

The classical entanglement is :

E(A,B) = C(A) + C(B) - C(AB)

where C is the complexity = H_Kolmogorov - H_FCM
This is mathematically very similar to the quantum entanglement
above.  Note that if we used the Shannon entropy H instead
the C in the classical entanglement, then we would have
E = 0 for all A and B.


Rather than answer the various questions on my comments
about non-unitary evolution, I'll instead simply elaborate
on that application of the 'complexity'.

We consider a pair of systems, A and B, where we might
say that A is under study and B is the rest of the universe.
Then the joint system AB = C will evolve unitarily.
The system A, however, will evolve non-unitarily, eg.
under a super-operator or Lindbladian.  (I'm not familiar
with Prigogine's work, but I imagine it is the same thing).

Now, the system C evolves unitary, which means that the
state of the system at time n = S(n) , a complex number
(or a vector in a hilbert space), will exactly determine
the state S(n+1).  Now, the sequence of "numbers" S
is thus exactly the type of data that a finite-context (FCM)
compressor will pack.  That is, a finite-context
compressor looks at the previous N entries and tries to
guess the next.  But we have just said that knowing S(n)
you can perfectly guess S(n+1), so that the FCM will
guess correctly every time.  Now, the Kolmogorov
complexity can do no better than perfect, so that
H_Kolmogorov = H_FCM, so that Complexity = H_K - H_FCM = 0 ,
so a unitary system evolves with zero complexity.

Now, back to our model where C = AB is the universe, which
is unitary, so has zero complexity of evolution.  The part
A evolves non-unitarily.  As a simple example, let us
consider the simple case where the universe consists of
a single marble.  That marble has two states : in A, or
doing something in B (the states in B are such that AB
is unitary, but there are many & complicated states in B).
Thus the sequence of states S(n).. in A are a series of
bits 0 and 1 for whether or not the marble is in A
(or a series of complex coefficients in front the
kets |0> and |1> , in which case each coefficient evolves
independently in the way I describe).  Now, these bits
may appear nearly random in A, with the marble flying in
and out, so that the H_FCM is roughly 1 per bit.  However,
the Kolmogorov compressor can work optimally be enumerating
the states in B and the initial condition of the marble.
(note that it can find the states of B simply by trying
all possibilities, which is what it does).  This is a finite
amount of information, while H_FCM scales with the time
elapsed.  Hence, the complexity of A's evolution = H_FCM - H_K
can be made arbitrarily large.

This is quite general : the Kolmogorov compressor can always
output at worst finitely much data by simply describing the
joint system AB and the initial condition, and then calculating
the S(n) of A by manually doing the evolution.  On the other
hand, the Finite-Context compressor can only act on the actual
states S(n).

The result is that C=AB has zero complexity, but the subsection
A can have evolve with arbitrarily large complexity.  Hence
the complexity of a system's evolution measures whether or
not it is truly isolated ("closed").


>Do you mean, the more complex the system is, for example,
>in the sense of the biology of open systems far from the
>thermodynamic branch, the more non-unitary its evolution must be? 

This is actually a conclusion I had not yet arrived at,
but you are correct.  In order to create a "complex" system,
such as life (as I have categorized in my addenda) its evolution
must have been at some time complex.  This because the state
of a system is nothing but the result of its evolution, and so
the complexity of the state must be <= to the complexity of
the evolution.  Hence life can only be created by extremely
non-unitary evolution!


I think we just had a miscommunication because I was using
sloppy notation.  Let me be more precise.

|0,A> is a Ket for a state of qubit A

(0,A) = |0,A><0,A|  is the corresponding projector

hence, 1_A = the unit 2x2 matrix in basis A = (0,A) + (1,A)

So, returning to my example :

rho(AB) = 1/2 [ (0,A)(0,B) + (1,A)(1,B)]

rho(A) = rho(B) = 1/2 ( where 1 is a 2x2 matrix)
or rho(A) = 1/2 [ (0,A) + (1,A) ]

so that the joint system AB is not any pure state;  it
can be interpretted as an ensemble : either A and B are
both 0 or both 1 .

H = - Trace[rho l2 rho ]

>I do not understand the above calculation. Given that the perfectly entangled pair state
>  |A,B> = (1/sqrt2)[|0,A>|0,B> + |1,A>|1,B>]
>Then the total Shannon entropy is 0

This is all correct, we're just talking about different states for AB, which
lead to the same local density-matrices A and B.

>The reduced density matrices are
>rho(A) = rho(B) = 1
>For example, these are locally completely unpolarized photons.

Right, we have the same reduced density matrices.

>So define the nonlocal quantum entropy of entanglement H(A|B) by
>H(A,B) = H(A) + H(B) + H(A|B) = 1 + 1 + H(A|B) = 0
>H(A|B) = -2

That's fine; and in my case H(A|B) = -1
My H(A:B) = - H(A|B) , since I like my entropies to
be positive !  However, I find it a little strange to talk of
these quantum entropies as containing information.  In classical
entropies, the larger the entropy, the greater the information
content.  However, a qubit for entropy one contains no information
at all - that is, measurement results are random; at the same
time, a qubit for entropy zero has 1 bit of information, since it
is in a pure state and we can measure that state along some axis
to get an explicity 0 or 1 classical bit.  This is why I define
my H(A:B) to have the opposite sign of your H(A|B) , so that H(A:B)
represents the positive amount of information which can be gotten
from AB together that cannot be gotten from A & B alone.

>From this we see that nonlocal quantum connection, in orthodox theory, are reservoirs of
>information, which, because of Eberhard's theorem, cannot be locally decoded to send
>practical messages faster than light or to receive them before they are sent.

This is a beautiful truth about quantum theory.  I am pursuing this
along two lines.  First, it seems that quantum teleportation might actually
send more than one bit of information per qubit; this is till under debate.
Second, the quantum theory seems to very delicately balance the fact that
there can be non-local correlations, but these can only be detected by local
measurements; I'm trying to see if this can be used as the defining postulate
for quantum theory, or if there are any other theories which have this
property (no "hidden variable" theory has this property, for example).

Charles Bloom / cb at my domain

Send Me Email

Back to the News Index

The free web counter says you are visitor number