log2000

log of research in progress 2000

August 27th 2000
Im in post conference burn out - which is problem because there's still one to go.
Anyway, ALife was great - I came to see how, hierarchical decomposability of problems, hierarchical/modular organisation of living things, quasi-attractors, even neutral nets, are facets of the same subject. And that gave me an idea of where to find hierarchically decomposable problems and why they are pandemic. Essentially, HDPs are the product of itterated function systems (IFSs) or just dynamical systems (DSs). In short, one way to look at the evolution of a dynamical system is that the components of the system move about in regular paths except when the interact with each other - on these occasions their paths are perturbed. If they are perturbed into a trajectory where they interact with other components again then they will be perturbed again - then they are. If they move to a trajectory where they are perturbed less - then they tend to stay there. So, one way to look at what it means to approach equilibrium, to approach an attractor, is that the sysytem finds a state which minimises the interaction of components. This applies recursivley, producing hierarchical interdependency of subsystems.
Then, when a system has a long periodic attractor it is often made up of a smaller quasi periodic attractor - an trajectory that doesnt quite come back to exactly the same point, so it goes around again, and again, each time the 'start point' drifts a little. And eventually the startpoint does come back to where it started from - so the exact cycle is very long but made up of an inner loop that is not quite exactly cyclical. This is common in DS dynamics. What does it mean? It means that an efficient description of the system can be made by giving the position of the system in the inner loop and in the outer loop - the two counters, as it were. And this means that if we were trying to learn to predict or copy this system it would be easy to find the inner loop (beacuse its fast and relatively short), and having differenced out this, it would be easy to find the outer loop (using the BB of the inner loop, a description of the outer loop would be short too).
Why do nested cyclic attractors like this occur in DSs? Imagine a gravitational mass system of 4 masses, ABCD. Let A and B be in a coupled circular mutual attraction - eg like earth/moon. Let C and D have their own thing. Now put the A/B sub-system in mutual orbit around the C/D subsystem. This is clustering of interaction is natural in DSs - it produces systems with subsystems. To get all 4 masses back to exactly the same positions may take a very long time. but its very useful to talk about the position of the earth/moon subsystem relative to the sun - and then add the detail that the moon phase shifts each year.
Why is this system/subsystem arrangement common in natural DSs? because, as I said, its a natural consequence of moving toward an attractor/ stable equilibrium. Not even a 'consequence' - its just another way to describe what a stable equilibrium is - i.e. a state where changes in state are isolated and do not propogate through the whole system. And near equilibrium, means changes in state probably dont propogate throught the whole system, or dont have much effect on the rest of the system - something like that.
I think that this property - the property of naturally producing attractors that are composites of other attractors ((A/B)/(C/D)) may not be true of all types of DS, only natural ones. And that might be something to do with the fact that all natural forces of interaction dissipate with distance - that gives a naturally localised nature to interactions. Moreover, local interactions, since they have stronger forces, reach equilibrium faster than distal interactions. So, after the system has partially settled, local interactions are in equilibrium and distal interactions are not, yet. Which gives a system/subsystem structure. On the other hand, it might be the case that any arbitrary network of interactions will have some strong and relatively self contained and some weak and relatively widely distributed - so maybe arbitrary DSs still have the natural tendency to create a hierarchical subsystem structure when itterated.

The juicy thing is that if this is all on the right path - then it means that the applicability of SEAM/ my methods of problem decomposition are so widespread it isnt funny. It means, potentially, that you can take any arbitrary function and itterate it - then the properties of this IFS near equilibrium are HDP and can be 'solved'. It makes perfect sense in biology too - the reason that symbiogenesis is a good way to exploit the nature of the problem of survivabiity is beacuse the environment is a huge (biological) DS.

The ability to learn to predict, control, mimic a DS near equilibrium - is also related to "embedding" and "phase locking", BTW. If you connect one DS to another DS (with the same number of state variables/physics) then it'll automatically resonate to the first. It'll find a state where its behaviour is the mirror of the other DS because this is the state which minimises the interaction of the two systems. Imagine system A (driver) connected to system B (driven) - at first B is violently perturbed by the connection to A. Then the highest energy oscillations of A are induced into B so they move together - just because if they dont move together B is perturbed by A, and so B keeps changing (being perturbed) until they do move together. Then after the high energy sub-components have been absorbed, further adjustments must be made for the low-frequency, lower energy componets of As movement. And so on. Eventually, B exactly copies the dynamics of A (at least at the interface, but not necessarily internally) - and will continue to do so if disconnected from A. I should try that! Sounds impossible - well in general it is, but if A is near equilibrium then its not. If A is near equilibrium then its movement will have HD structure and be amenable to induction.

So, whats a nice neat simple example? Im working on it :)

Meanwhile, I realise that the symbiotic joins in SEAM are too heavy handed - Im working on a 'soft groups' model that allows more subtle relationship forming.

In other news: there are several pieces of work I should try as journal papers; I havnt completed a thesis proposal or finalised a commitee; and there's only 9 months left to finish!

I wonder if Im going to be able to develop the above ideas sufficiently in time for the thesis. Maybe not. I'll have to decide a cut-off date for new work - and it'd better be soon... end of Nov? leaving 6 months to write-up?

May 9th 2000
In the last week I got an algorithm that solves the shuffled HIFF.
It doesnt know anything about where the blocks are (no assumption of bit adjacency).
It doesnt use any assumption that genotypic (dis)similarity is a useful metric for diversity maintenance.
It doesnt use any knowledge of how expected fitness grows with the size of individuals (as per my GECCO1999 hack).
It doesnt use partial evaluation, complete enumeration in intialisation, two-phase implementation, or any other nasty hacks often found in Messy GAs.

It uses partially specified individuals to explicitly represent useful schemata. This enables us to search combinations of schemata directly without the need for a recombination operator that knows where the blocks are (ie uses heuristic of adjacency).
However, like most objective functions, we cannot measure the fitness of partially specified solutions directly - thus we need to fill-in unspecified parameters with a template.
But, if use just any old random template then we have three problems: 1) the 'noise' in evaluation coming from the template is too high with respect to the 'signal' coming from the individual in question. 2) large sub-optimal individuals are generaly better than small individuals with random templates so individuals tend to fill with sub-optimal schemata very quickly. 3) the templates that reveal the high-order dependencies are rare - the number of random templates that must be tested increases with the exponent of the block-size.

The first problem can be solved by using fitness differencing - F(A,t) vs F(B,t). i.e. only use the difference in fitness that two individuals (A & B) exhibit when measured against the same template, t.
The second problem is harder. Basically any method that averages the fitness contribution over many trial templates will find that a big individual is reliably better than a small one even if the big one is not the best an individual of that size could be. The key here is to use Pareto domination instead of comparing average fitnesses.
The third problem is where symbiosis comes in. Instead of using random templates, templates are built by assembling groups of other individuals from the population. This enables the combination of 'best schemata so far' to provide a locally optimised template that reveals higher-order interdependencies between blocks.

With these features together I have demonstrated what symbiogenesis is good for. Finally!
Combining pre-adapted parts (the intuition behind the BB hypothesis) can obviously provide huge computational savings. But how do we discover how to decompose the problem? If clues about the decomposition are provided (eg. adjacency) then the regular GA can do the work. But if there are no clues we must rely on low-order components of the fitness function to reveal them and provide a mechaism for individuals to represent the schemata explicitly and a mechanism to recombine them. A representation based on subsets of parameters is straight forward enough. And a recombination mechanism that takes the union of subsets is appropriate in this case. However, we cannot assume that the low-order components of a fitness function will tell us we have found a BB explicitly. Low-order components must be discovered by sampling. Sampling low-order correlations is not so hard. But doing the appropriate sampling to find the higher-order components is much harder. However, if the problem has a 'compositional' nature then the solutions to low-order dependencies lead search to find the solution to higher-order dependencies by reducing the dimensionality of the space.

Anyway, the other cool part is that it uses symbiotic scaffolding. First, note that there is a practical way to use the Baldwin effect - on a rugged landscape it can be used to smooth the fitness landscape and provide a kind of 'look-ahead' about candidate variants. The look-ahead doesnt necessarily find the solution (as it does in Hinton and Nowlan), but it informs search about where it can get to from the candidate variant. Its just like looking ahead 'ply' in a two-player game and applying the static evaluation function to the leaves of possibilities instead of to the immediate daughters. Basically, Im using this effect but search moves in recombination space not mutation space.

I've got it all figured out now - you'll have to wait for the thesis.

Apr 4th 2000
I submitted 5 conference papers this year. Three results so far, two positive.
I've done some preliminary investigations into 'Group HIFF' - although they showed some positive signs the overall process is not properly regulated... in other words, it didnt work.

Here's the story so far. Pretty cryptic, but here it is anyway....

Goal: Use group evaluation of partially specified individuals to enable automatic problem decomposition and solution composition. This will bring together the 'incremental commitment' idea (GECCO99), with 'co-evaluation' (ECAL99 and ALife7), all applied to a hierarchical building-block problem HIFF (PPSN98, CEC99, PPSN2000, FOGA2000).

I identified several things that need to work together: diversity maintenance, group evaluation without too much noise, credit assignment, regulation of string growth, a problem to apply it to.

I have the problem to apply it to, of course.
I now have a diversity maintenance technique that works great (deterministic crowding), and theres also the possibility of using 'perverse xover' as a diversity maintenance technique (it works great on regular HIFF).
Preliminaries suggest that 'fitness differencing' where the group is drawn from other population members is sufficient to overcome background noise and successfully destinguish good blocks from bad blocks (of a given block size). i.e. to evaluate individual IND, subtract 'fitness of group without IND' from 'fitness of group with IND'.
This also solves credit assignment - rather than share-out the fitness through the group, the fitness (difference) is given only to IND.
But, regulation of string growth is failing. I cant destinguish between good-block and good-block-with-an-extra-(incompatible)-subblock-tagged-on. Consequently, the strings fill with incompatible blocks and since each individual is now fully specified the group evaluation and subsequent combination of individuals is thwarted.

My fear is that all I've succeeded in doing is transforming the usual problem of premature convergence into a problem of premature commitment; moreover, since there cannot be any general method to know when the sub-block you are adding is the best possible sub-block to add (ie is it right or is it garbage), this problem is never going to go away. My best hope, I think, is to enable some 'tunability' of commitment rate.

Here's an analogy. In regular search methods the problem is to know whether 'the thing you found that is better' is 'the best thing you could find' - if you go with it you may be lead into a trap (local optima). In regular search methods you can regulate the degree of caution: eg annealing, or in EAs, selection pressure, mutation/xover rate, population size, etc. There can never be any general method to ensure that the path you decide to follow is the best one, the best you can do is regulate the amount of caution used and trade-off confident fast search (that wont work on deceptive problems), with cautious slow search (that will take longer than necessary on easy problems).

As it stands I guess Im not able to regulate the commitment rate. I have to make joins (which are permanent) in order to explore the space of joins. If I turn down the combination rate it would tend to assemble better indivs more reliably. But the fitness evaluation actually thinks that the blocks with garbage are the better ones. They could only be selected against if the blocks at the next level were already present. And thats not going to happen because I cant keep the sodding garbage out of the blocks from the previous level.

Anyway, I have some ideas to try:

I need to verify and quantify, and find limits for the fitness differencing method. i.e. it can destinguish quality of blocks at a given size, but how much garbage can it take, does the size of block and the size of garbage come into play? Am I sure that there's no size-penalty hack that would make it work as a first approximation to my goal? How much difference does the quality of the population make? and the make-up of the population (i.e. converged or diverse)? Does a random template work just as well? Maybe I could even measure the 'pressure' to increase in size by comparing fitness of indivs with/without garbage.
One thing to try is 'association preferences' a kind of 'soft' join. In this way pairs could be evaluated voluminously without having the join be heritable until its been prooven.
Another thing to use for investigation is a two-needle version of the symbiotic scaffolding experiment. This will show whether it is possible to 'prime' the population for joins when there is more than one solution to find. This makes an interesting experiment anyway - I will be able to show that the solution found by genetic search is always preceeded by the SAME solution in group search. i.e. non-genetic variation guides genetic variation. Whereas previously, I couldnt really say that one guided the other since there was only one thing to find anyway. And, this makes the idea of one organism acquiring the characteristics of another a little more obvious. Additionally, if I use a diversity maintenance technique so that the population as a whole cannot converge then there cannot be reliable symbionts. This is an alternate way of adding an implicit cost to relying on symbionts and should encourage independence even without using probabilistic group-size limit.
Anyway, as far as group HIFF is concerned, this model could be informative because individuals can only be successful if they get together with the right kind of symbiont or if they dont need a symbiont. This is interesting from a size-limit point of view. It may explain why the 'complacency' feature I saw in symbiosis cannot be exhibited when applied to HIFF.
Here's a thought... what if I try group evaluation on HIFF without joins? i.e. real symbiotic scaffolding, not SEAMing. That cant work, it would get to a stage where two 32bit indivs make a whole and a 32bit _mutation_ is required to get the whole independently... Unless the symbiont is imperfect, like in the current experiments, that would mean that even small improvements in the bit count would be rewarded... its worth a try - I might be surprised. The symbionts cant be perfect if the population is diverse. So there would still be an advantage in filling with suboptimal blocks, I think. The advantage would be that the garbage would fill slowly though - i.e. the fill rate doesnt increase exponentially. Maybe Ill try this next...?

See publications for further background.