**August 27th 2000**

Im in post conference burn out - which is problem because there's still
one to go.

Anyway, ALife was great - I came to see how, hierarchical decomposability
of problems, hierarchical/modular organisation of living things, quasi-attractors,
even neutral nets, are facets of the same subject. And that gave me an
idea of where to find hierarchically decomposable problems and why they
are pandemic. Essentially, HDPs are the product of itterated function systems
(IFSs) or just dynamical systems (DSs). In short, one way to look at the
evolution of a dynamical system is that the components of the system move
about in regular paths except when the interact with each other - on these
occasions their paths are perturbed. If they are perturbed into a trajectory
where they interact with other components again then they will be perturbed
again - then they are. If they move to a trajectory where they are perturbed
less - then they tend to stay there. So, one way to look at what it means
to approach equilibrium, to approach an attractor, is that the sysytem
finds a state which minimises the interaction of components. This applies
recursivley, producing hierarchical interdependency of subsystems.

Then, when a system has a long periodic attractor it is often made
up of a smaller quasi periodic attractor - an trajectory that doesnt quite
come back to exactly the same point, so it goes around again, and again,
each time the 'start point' drifts a little. And eventually the startpoint
does come back to where it started from - so the exact cycle is very long
but made up of an inner loop that is not quite exactly cyclical. This is
common in DS dynamics. What does it mean? It means that an efficient description
of the system can be made by giving the position of the system in the inner
loop and in the outer loop - the two counters, as it were. And this means
that if we were trying to learn to predict or copy this system it would
be easy to find the inner loop (beacuse its fast and relatively short),
and having differenced out this, it would be easy to find the outer loop
(using the BB of the inner loop, a description of the outer loop would
be short too).

Why do nested cyclic attractors like this occur in DSs? Imagine a gravitational
mass system of 4 masses, ABCD. Let A and B be in a coupled circular mutual
attraction - eg like earth/moon. Let C and D have their own thing. Now
put the A/B sub-system in mutual orbit around the C/D subsystem. This is
clustering of interaction is natural in DSs - it produces systems with
subsystems. To get all 4 masses back to exactly the same positions may
take a very long time. but its very useful to talk about the position of
the earth/moon subsystem relative to the sun - and then add the detail
that the moon phase shifts each year.

Why is this system/subsystem arrangement common in natural DSs? because,
as I said, its a natural consequence of moving toward an attractor/ stable
equilibrium. Not even a 'consequence' - its just another way to describe
what a stable equilibrium is - i.e. a state where changes in state are
isolated and do not propogate through the whole system. And near equilibrium,
means changes in state probably dont propogate throught the whole system,
or dont have much effect on the rest of the system - something like that.

I think that this property - the property of naturally producing attractors
that are composites of other attractors ((A/B)/(C/D)) may not be true of
all types of DS, only natural ones. And that might be something to do with
the fact that all natural forces of interaction dissipate with distance
- that gives a naturally localised nature to interactions. Moreover, local
interactions, since they have stronger forces, reach equilibrium faster
than distal interactions. So, after the system has partially settled, local
interactions are in equilibrium and distal interactions are not, yet. Which
gives a system/subsystem structure. On the other hand, it might be the
case that any arbitrary network of interactions will have some strong and
relatively self contained and some weak and relatively widely distributed
- so maybe arbitrary DSs still have the natural tendency to create a hierarchical
subsystem structure when itterated.

The juicy thing is that if this is all on the right path - then it means that the applicability of SEAM/ my methods of problem decomposition are so widespread it isnt funny. It means, potentially, that you can take any arbitrary function and itterate it - then the properties of this IFS near equilibrium are HDP and can be 'solved'. It makes perfect sense in biology too - the reason that symbiogenesis is a good way to exploit the nature of the problem of survivabiity is beacuse the environment is a huge (biological) DS.

The ability to learn to predict, control, mimic a DS near equilibrium - is also related to "embedding" and "phase locking", BTW. If you connect one DS to another DS (with the same number of state variables/physics) then it'll automatically resonate to the first. It'll find a state where its behaviour is the mirror of the other DS because this is the state which minimises the interaction of the two systems. Imagine system A (driver) connected to system B (driven) - at first B is violently perturbed by the connection to A. Then the highest energy oscillations of A are induced into B so they move together - just because if they dont move together B is perturbed by A, and so B keeps changing (being perturbed) until they do move together. Then after the high energy sub-components have been absorbed, further adjustments must be made for the low-frequency, lower energy componets of As movement. And so on. Eventually, B exactly copies the dynamics of A (at least at the interface, but not necessarily internally) - and will continue to do so if disconnected from A. I should try that! Sounds impossible - well in general it is, but if A is near equilibrium then its not. If A is near equilibrium then its movement will have HD structure and be amenable to induction.

So, whats a nice neat simple example? Im working on it :)

Meanwhile, I realise that the symbiotic joins in SEAM are too heavy handed - Im working on a 'soft groups' model that allows more subtle relationship forming.

In other news: there are several pieces of work I should try as journal papers; I havnt completed a thesis proposal or finalised a commitee; and there's only 9 months left to finish!

I wonder if Im going to be able to develop the above ideas sufficiently in time for the thesis. Maybe not. I'll have to decide a cut-off date for new work - and it'd better be soon... end of Nov? leaving 6 months to write-up?

**May 9th 2000**

In the last week I got an algorithm that solves the shuffled HIFF.

It doesnt know anything about where the blocks are (no assumption of
bit adjacency).

It doesnt use any assumption that genotypic (dis)similarity is a useful
metric for diversity maintenance.

It doesnt use any knowledge of how expected fitness grows with the
size of individuals (as per my GECCO1999 hack).

It doesnt use partial evaluation, complete enumeration in intialisation,
two-phase implementation, or any other nasty hacks often found in Messy
GAs.

It uses partially specified individuals to explicitly represent useful
schemata. This enables us to search combinations of schemata directly without
the need for a recombination operator that knows where the blocks are (ie
uses heuristic of adjacency).

However, like most objective functions, we cannot measure the fitness
of partially specified solutions directly - thus we need to fill-in unspecified
parameters with a template.

But, if use just any old random template then we have three problems:
1) the 'noise' in evaluation coming from the template is too high with
respect to the 'signal' coming from the individual in question. 2) large
sub-optimal individuals are generaly better than small individuals with
random templates so individuals tend to fill with sub-optimal schemata
very quickly. 3) the templates that reveal the high-order dependencies
are rare - the number of random templates that must be tested increases
with the exponent of the block-size.

The first problem can be solved by using fitness differencing - F(A,t)
vs F(B,t). i.e. only use the difference in fitness that two individuals
(A & B) exhibit when measured against the same template, t.

The second problem is harder. Basically any method that averages the
fitness contribution over many trial templates will find that a big individual
is reliably better than a small one even if the big one is not the best
an individual of that size could be. The key here is to use Pareto domination
instead of comparing average fitnesses.

The third problem is where symbiosis comes in. Instead of using random
templates, templates are built by assembling groups of other individuals
from the population. This enables the combination of 'best schemata so
far' to provide a locally optimised template that reveals higher-order
interdependencies between blocks.

With these features together I have demonstrated what symbiogenesis
is good for. Finally!

Combining pre-adapted parts (the intuition behind the BB hypothesis)
can obviously provide huge computational savings. But how do we discover
how to decompose the problem? If clues about the decomposition are provided
(eg. adjacency) then the regular GA can do the work. But if there are no
clues we must rely on low-order components of the fitness function to reveal
them and provide a mechaism for individuals to represent the schemata explicitly
and a mechanism to recombine them. A representation based on subsets of
parameters is straight forward enough. And a recombination mechanism that
takes the union of subsets is appropriate in this case. However, we cannot
assume that the low-order components of a fitness function will tell us
we have found a BB explicitly. Low-order components must be discovered
by sampling. Sampling low-order correlations is not so hard. But doing
the appropriate sampling to find the higher-order components is much harder.
However, if the problem has a 'compositional' nature then the solutions
to low-order dependencies lead search to find the solution to higher-order
dependencies by reducing the dimensionality of the space.

Anyway, the other cool part is that it uses symbiotic scaffolding. First, note that there is a practical way to use the Baldwin effect - on a rugged landscape it can be used to smooth the fitness landscape and provide a kind of 'look-ahead' about candidate variants. The look-ahead doesnt necessarily find the solution (as it does in Hinton and Nowlan), but it informs search about where it can get to from the candidate variant. Its just like looking ahead 'ply' in a two-player game and applying the static evaluation function to the leaves of possibilities instead of to the immediate daughters. Basically, Im using this effect but search moves in recombination space not mutation space.

I've got it all figured out now - you'll have to wait for the thesis.

**Apr 4th 2000**

I submitted 5 conference papers this year. Three results so far, two
positive.

I've done some preliminary investigations into 'Group HIFF' - although
they showed some positive signs the overall process is not properly regulated...
in other words, it didnt work.

Here's the story so far. Pretty cryptic, but here it is anyway....

Goal: Use group evaluation of partially specified individuals to enable automatic problem decomposition and solution composition. This will bring together the 'incremental commitment' idea (GECCO99), with 'co-evaluation' (ECAL99 and ALife7), all applied to a hierarchical building-block problem HIFF (PPSN98, CEC99, PPSN2000, FOGA2000).

I identified several things that need to work together: diversity maintenance, group evaluation without too much noise, credit assignment, regulation of string growth, a problem to apply it to.

- I have the problem to apply it to, of course.
- I now have a diversity maintenance technique that works great (deterministic crowding), and theres also the possibility of using 'perverse xover' as a diversity maintenance technique (it works great on regular HIFF).
- Preliminaries suggest that 'fitness differencing' where the group is drawn from other population members is sufficient to overcome background noise and successfully destinguish good blocks from bad blocks (of a given block size). i.e. to evaluate individual IND, subtract 'fitness of group without IND' from 'fitness of group with IND'.
- This also solves credit assignment - rather than share-out the fitness through the group, the fitness (difference) is given only to IND.
- But, regulation of string growth is failing. I cant destinguish between good-block and good-block-with-an-extra-(incompatible)-subblock-tagged-on. Consequently, the strings fill with incompatible blocks and since each individual is now fully specified the group evaluation and subsequent combination of individuals is thwarted.

Here's an analogy. In regular search methods the problem is to know whether 'the thing you found that is better' is 'the best thing you could find' - if you go with it you may be lead into a trap (local optima). In regular search methods you can regulate the degree of caution: eg annealing, or in EAs, selection pressure, mutation/xover rate, population size, etc. There can never be any general method to ensure that the path you decide to follow is the best one, the best you can do is regulate the amount of caution used and trade-off confident fast search (that wont work on deceptive problems), with cautious slow search (that will take longer than necessary on easy problems).

As it stands I guess Im not able to regulate the commitment rate. I have to make joins (which are permanent) in order to explore the space of joins. If I turn down the combination rate it would tend to assemble better indivs more reliably. But the fitness evaluation actually thinks that the blocks with garbage are the better ones. They could only be selected against if the blocks at the next level were already present. And thats not going to happen because I cant keep the sodding garbage out of the blocks from the previous level.

Anyway, I have some ideas to try:

- I need to verify and quantify, and find limits for the fitness differencing method. i.e. it can destinguish quality of blocks at a given size, but how much garbage can it take, does the size of block and the size of garbage come into play? Am I sure that there's no size-penalty hack that would make it work as a first approximation to my goal? How much difference does the quality of the population make? and the make-up of the population (i.e. converged or diverse)? Does a random template work just as well? Maybe I could even measure the 'pressure' to increase in size by comparing fitness of indivs with/without garbage.
- One thing to try is 'association preferences' a kind of 'soft' join. In this way pairs could be evaluated voluminously without having the join be heritable until its been prooven.
- Another thing to use for investigation is a two-needle version of the symbiotic scaffolding experiment. This will show whether it is possible to 'prime' the population for joins when there is more than one solution to find. This makes an interesting experiment anyway - I will be able to show that the solution found by genetic search is always preceeded by the SAME solution in group search. i.e. non-genetic variation guides genetic variation. Whereas previously, I couldnt really say that one guided the other since there was only one thing to find anyway. And, this makes the idea of one organism acquiring the characteristics of another a little more obvious. Additionally, if I use a diversity maintenance technique so that the population as a whole cannot converge then there cannot be reliable symbionts. This is an alternate way of adding an implicit cost to relying on symbionts and should encourage independence even without using probabilistic group-size limit.
- Anyway, as far as group HIFF is concerned, this model could be informative because individuals can only be successful if they get together with the right kind of symbiont or if they dont need a symbiont. This is interesting from a size-limit point of view. It may explain why the 'complacency' feature I saw in symbiosis cannot be exhibited when applied to HIFF.
- Here's a thought... what if I try group evaluation on HIFF without joins? i.e. real symbiotic scaffolding, not SEAMing. That cant work, it would get to a stage where two 32bit indivs make a whole and a 32bit _mutation_ is required to get the whole independently... Unless the symbiont is imperfect, like in the current experiments, that would mean that even small improvements in the bit count would be rewarded... its worth a try - I might be surprised. The symbionts cant be perfect if the population is diverse. So there would still be an advantage in filling with suboptimal blocks, I think. The advantage would be that the garbage would fill slowly though - i.e. the fill rate doesnt increase exponentially. Maybe Ill try this next...?