log of research in progress 2000

August 27th 2000
Im in post conference burn out - which is problem because there's still one to go.
Anyway, ALife was great - I came to see how, hierarchical decomposability of problems, hierarchical/modular organisation of living things, quasi-attractors, even neutral nets, are facets of the same subject. And that gave me an idea of where to find hierarchically decomposable problems and why they are pandemic. Essentially, HDPs are the product of itterated function systems (IFSs) or just dynamical systems (DSs). In short, one way to look at the evolution of a dynamical system is that the components of the system move about in regular paths except when the interact with each other - on these occasions their paths are perturbed. If they are perturbed into a trajectory where they interact with other components again then they will be perturbed again - then they are. If they move to a trajectory where they are perturbed less - then they tend to stay there. So, one way to look at what it means to approach equilibrium, to approach an attractor, is that the sysytem finds a state which minimises the interaction of components. This applies recursivley, producing hierarchical interdependency of subsystems.
Then, when a system has a long periodic attractor it is often made up of a smaller quasi periodic attractor - an trajectory that doesnt quite come back to exactly the same point, so it goes around again, and again, each time the 'start point' drifts a little. And eventually the startpoint does come back to where it started from - so the exact cycle is very long but made up of an inner loop that is not quite exactly cyclical. This is common in DS dynamics. What does it mean? It means that an efficient description of the system can be made by giving the position of the system in the inner loop and in the outer loop - the two counters, as it were. And this means that if we were trying to learn to predict or copy this system it would be easy to find the inner loop (beacuse its fast and relatively short), and having differenced out this, it would be easy to find the outer loop (using the BB of the inner loop, a description of the outer loop would be short too).
Why do nested cyclic attractors like this occur in DSs? Imagine a gravitational mass system of 4 masses, ABCD. Let A and B be in a coupled circular mutual attraction - eg like earth/moon. Let C and D have their own thing. Now put the A/B sub-system in mutual orbit around the C/D subsystem. This is clustering of interaction is natural in DSs - it produces systems with subsystems. To get all 4 masses back to exactly the same positions may take a very long time. but its very useful to talk about the position of the earth/moon subsystem relative to the sun - and then add the detail that the moon phase shifts each year.
Why is this system/subsystem arrangement common in natural DSs? because, as I said, its a natural consequence of moving toward an attractor/ stable equilibrium. Not even a 'consequence' - its just another way to describe what a stable equilibrium is - i.e. a state where changes in state are isolated and do not propogate through the whole system. And near equilibrium, means changes in state probably dont propogate throught the whole system, or dont have much effect on the rest of the system - something like that.
I think that this property - the property of naturally producing attractors that are composites of other attractors ((A/B)/(C/D)) may not be true of all types of DS, only natural ones. And that might be something to do with the fact that all natural forces of interaction dissipate with distance - that gives a naturally localised nature to interactions. Moreover, local interactions, since they have stronger forces, reach equilibrium faster than distal interactions. So, after the system has partially settled, local interactions are in equilibrium and distal interactions are not, yet. Which gives a system/subsystem structure. On the other hand, it might be the case that any arbitrary network of interactions will have some strong and relatively self contained and some weak and relatively widely distributed - so maybe arbitrary DSs still have the natural tendency to create a hierarchical subsystem structure when itterated.

The juicy thing is that if this is all on the right path - then it means that the applicability of SEAM/ my methods of problem decomposition are so widespread it isnt funny. It means, potentially, that you can take any arbitrary function and itterate it - then the properties of this IFS near equilibrium are HDP and can be 'solved'. It makes perfect sense in biology too - the reason that symbiogenesis is a good way to exploit the nature of the problem of survivabiity is beacuse the environment is a huge (biological) DS.

The ability to learn to predict, control, mimic a DS near equilibrium - is also related to "embedding" and "phase locking", BTW. If you connect one DS to another DS (with the same number of state variables/physics) then it'll automatically resonate to the first. It'll find a state where its behaviour is the mirror of the other DS because this is the state which minimises the interaction of the two systems. Imagine system A (driver) connected to system B (driven) - at first B is violently perturbed by the connection to A. Then the highest energy oscillations of A are induced into B so they move together - just because if they dont move together B is perturbed by A, and so B keeps changing (being perturbed) until they do move together. Then after the high energy sub-components have been absorbed, further adjustments must be made for the low-frequency, lower energy componets of As movement. And so on. Eventually, B exactly copies the dynamics of A (at least at the interface, but not necessarily internally)  - and will continue to do so if disconnected from A. I should try that! Sounds impossible - well in general it is, but if A is near equilibrium then its not. If A is near equilibrium then its movement will have HD structure and be amenable to induction.

So, whats a nice neat simple example? Im working on it :)

Meanwhile, I realise that the symbiotic joins in SEAM are too heavy handed - Im working on a 'soft groups' model that allows more subtle relationship forming.

In other news: there are several pieces of work I should try as journal papers; I havnt completed a thesis proposal or finalised a commitee; and there's only 9 months left to finish!

I wonder if Im going to be able to develop the above ideas sufficiently in time for the thesis. Maybe not. I'll have to decide a cut-off date for new work - and it'd better be soon... end of Nov? leaving 6 months to write-up?

May 9th 2000
In the last week I got an algorithm that solves the shuffled HIFF.
It doesnt know anything about where the blocks are (no assumption of bit adjacency).
It doesnt use any assumption that genotypic (dis)similarity is a useful metric for diversity maintenance.
It doesnt use any knowledge of how expected fitness grows with the size of individuals (as per my GECCO1999 hack).
It doesnt use partial evaluation, complete enumeration in intialisation, two-phase implementation, or any other nasty hacks often found in Messy GAs.

It uses partially specified individuals to explicitly represent useful schemata. This enables us to search combinations of schemata directly without the need for a recombination operator that knows where the blocks are (ie uses heuristic of adjacency).
However, like most objective functions, we cannot measure the fitness of partially specified solutions directly - thus we need to fill-in unspecified parameters with a template.
But, if use just any old random template then we have three problems: 1) the 'noise' in evaluation coming from the template is too high with respect to the 'signal' coming from the individual in question. 2) large sub-optimal individuals are generaly better than small individuals with random templates so individuals tend to fill with sub-optimal schemata very quickly. 3) the templates that reveal the high-order dependencies are rare - the number of random templates that must be tested increases with the exponent of the block-size.

The first problem can be solved by using fitness differencing - F(A,t) vs F(B,t). i.e. only use the difference in fitness that two individuals (A & B) exhibit when measured against the same template, t.
The second problem is harder. Basically any method that averages the fitness contribution over many trial templates will find that a big individual is reliably better than a small one even if the big one is not the best an individual of that size could be. The key here is to use Pareto domination instead of comparing average fitnesses.
The third problem is where symbiosis comes in. Instead of using random templates, templates are built by assembling groups of other individuals from the population. This enables the combination of 'best schemata so far' to provide a locally optimised template that reveals higher-order interdependencies between blocks.

With these features together I have demonstrated what symbiogenesis is good for. Finally!
Combining pre-adapted parts (the intuition behind the BB hypothesis) can obviously provide huge computational savings. But how do we discover how to decompose the problem? If clues about the decomposition are provided (eg. adjacency) then the regular GA can do the work. But if there are no clues we must rely on low-order components of the fitness function to reveal them and provide a mechaism for individuals to represent the schemata explicitly and a mechanism to recombine them. A representation based on subsets of parameters is straight forward enough. And a recombination mechanism that takes the union of subsets is appropriate in this case. However, we cannot assume that the low-order components of a fitness function will tell us we have found a BB explicitly. Low-order components must be discovered by sampling. Sampling low-order correlations is not so hard. But doing the appropriate sampling to find the higher-order components is much harder. However, if the problem has a 'compositional' nature then the solutions to low-order dependencies lead search to find the solution to higher-order dependencies by reducing the dimensionality of the space.

Anyway, the other cool part is that it uses symbiotic scaffolding. First, note that there is a practical way to use the Baldwin effect - on a rugged landscape it can be used to smooth the fitness landscape and provide a kind of 'look-ahead' about candidate variants. The look-ahead doesnt necessarily find the solution (as it does in Hinton and Nowlan), but it informs search about where it can get to from the candidate variant. Its just like looking ahead 'ply' in a two-player game and applying the static evaluation function to the leaves of possibilities instead of to the immediate daughters. Basically, Im using this effect but search moves in recombination space not mutation space.

I've got it all figured out now - you'll have to wait for the thesis.

Apr 4th 2000
I submitted 5 conference papers this year. Three results so far, two positive.
I've done some preliminary investigations into 'Group HIFF' - although they showed some positive signs the overall process is not properly regulated... in other words, it didnt work.

Here's the story so far. Pretty cryptic, but here it is anyway....

Goal: Use group evaluation of partially specified individuals to enable automatic problem decomposition and solution composition. This will bring together the 'incremental commitment' idea (GECCO99), with 'co-evaluation' (ECAL99 and ALife7), all applied to a hierarchical building-block problem HIFF (PPSN98, CEC99, PPSN2000, FOGA2000).

I identified several things that need to work together: diversity maintenance, group evaluation without too much noise, credit assignment, regulation of string growth, a problem to apply it to.

My fear is that all I've succeeded in doing is transforming the usual problem of premature convergence into a problem of premature commitment; moreover, since there cannot be any general method to know when the sub-block you are adding is the best possible sub-block to add (ie is it right or is it garbage), this problem is never going to go away. My best hope, I think, is to enable some 'tunability' of commitment rate.

Here's an analogy. In regular search methods the problem is to know whether 'the thing you found that is better' is 'the best thing you could find' - if you go with it you may be lead into a trap (local optima). In regular search methods you can regulate the degree of caution: eg annealing, or in EAs, selection pressure, mutation/xover rate, population size, etc. There can never be any general method to ensure that the path you decide to follow is the best one, the best you can do is regulate the amount of caution used and trade-off confident fast search (that wont work on deceptive problems), with cautious slow search (that will take longer than necessary on easy problems).

As it stands I guess Im not able to regulate the commitment rate. I have to make joins (which are permanent) in order to explore the space of joins. If I turn down the combination rate it would tend to assemble better indivs more reliably. But the fitness evaluation actually thinks that the blocks with garbage are the better ones. They could only be selected against if the blocks at the next level were already present. And thats not going to happen because I cant keep the sodding garbage out of the blocks from the previous level.

Anyway, I have some ideas to try:

See publications for further background.