## Thursday, April 29, 2010

### Surviving Entropy

My 'for-fun' research project is on survival strategies for complex systems.  I found an interesting result last night regarding systems survival.  To make sense of it, you need to understand the context.  The big picture is a system (think biological organism or intelligent civilization) that moves through time trying to maintain a state--surviving, in other words.  The system is made up of components that degrade via entropy.  As a simpler analogy, imagine a satellite that has an on-board computer, but has lost all but the most basic programming.  All that the thing can do is upload and run a new program you send it.  See the amazing graphic below.  This is like a system moving through time and being subject to random internal changes.  The program as it it sent is the original version, and the program as it arrives is the one after some time has gone by.

To simulate entropy, imagine that the satellite's radios cannot do any kind of error correction or detection--you have to build that into the code of the program you send.  This is a very difficult challenge--to write a program that can detect and correct errors in itself.  It can't be done with any kind of certainty because the functionality of the program may be corrupted by the very errors it needs to correct.

To do the math, we need some definitions, which the graphic below is intended to help.  The rows below are the symbols we send to the satellite's computer, comprising a program.  In the illustration the rows are 28 blocks long, representing 28 symbols (very short for a real program).  Each one of these is vulnerable to corruption due to noise in the channel.  The received code is marked red in a block if transmission error has caused the wrong symbol to be received.  Our first parameter n is the number of symbols.  So n=28 in the illustrations.

If the probability of a symbol arriving intact is p, and the probabilities are independent of one another, then the probability of all of them arriving intact is

Because I'm really interested in what happens in nature as survival systems are subjected to entropy, we need not restrict ourselves to sending a single copy of the program.  Nature is massively parallel, so imagine now that we have lots of computers running simultaneously and we can send several copies (call that parameter k).  In the graphic below, k=5, since there are five layers (five mutually redundant copies).
The probability of at least one program surviving is given by

Now imagine that our program can be broken up into self-contained chunks. Each segment has the property that it can detect if it is broken some of the time (I call this the Am I Me? test) and the ones that are working correctly can tell detect and incapacitate copies of their chunk that aren't working (the Are You Me? test). All we need is one surviving chunk from each type. This is illustrated below, with the vertical dividers showing chunks. Dark green blocks are members of chunk-instances that survive. The number of chunks is parameter m.
In the simplest case where all the chunks are the same size (m divides n), the probability of survival is now

As you would guess,when k is larger the probability of survival (correct uploading of the program) increases, and a m grows the same is true.  What's interesting is that the best trade-off between the two is different, depending on whether p is near zero or near one.  When the probability p of any one symbol arriving correctly is small, the advantage of the segmented transmission (m >1) over unsegmented (m = 1) is

This means that for unreliably transmission of symbols, once k is modestly large, it pays to concentrate on building up m because of the way exponents work.  So more segments is the best survival strategy.  On the other hand, if the chances of an individual symbol arriving intact are high, we have

[Edit: actually the limit is for (1-seg)/(1-unseg), which is a comparison of the chance of NOT surviving, but the rest of the conclusions hold.  See proofs.) The proofs of these limits are too long to include in the margins of this blog.  Notice the symmetry?  The situation is now reversed, and once we have a modest number of segments, we're better off building more redundancy in them.  Also notice that the number of program steps n is not present in either limit.  This means that the strategies are scalable.

I'm looking for real world examples of this phenomenon, either in human-designed systems or biological ones.  For example, in comparing molecules to cells in the human body, we can assume that molecular mechanism are very reliable, with high probability of persisting across time (p close to one).  Cells have both Am I Me? and Are You Me? tests (programmed cell death and the immune system, respectively), so they qualify as components made up of molecules.  Many cells of different types make up a body, so the number of types is m and the number of each type is k.  Without further assumptions, the model above would predict that we would see many more copies of each cell type than number of cell types.  This seems to be the case: there are trillions of cells and hundreds of cell types.

How robust is this?  Are there other examples?  I'd like to find an example where p is low.