Examining Bias in Genetic Algorithms

Introduction

Historically, researchers have used genetic algorithms as a tool to solve various machine learning problems. In this research, the experimenters are forced to introduce bias into the system to arrive at the results they desire. Indeed, recent work seems to suggest the necessity of introducing bias into learning systems. However, any bias introduced will obviously alter the results. The purpose of this paper is to examine a number of common forms of bias, and see what their effect is on a particular experiment.

At the most basic level, the choice of problem is itself an unavoidable bias. Because this is an experiment about experimentation, a well-known domain has been chosen: that of the Iterated Prisoner's Dilemma, first tackled using genetic algorithms in the early 80's [Axelrod84]. There is a wealth of results for this particular problem; it has been studied since well before the advent of computers, and has gotten a lot of attention as a deceptively complex but easy-to-understand problem. The variety of approaches used to solve the basic problem of cooperation or defection given the standard payoff matrix is astounding. For this paper, only those approaches which make use of some form of genetic algorithms will be discussed. A casual search will reveal that this approach dominates the field, and is yet among the least understood.

There are many biases inherent in implementing a GA-based solution. They can be broken down roughly into 3 types: representational, implementational, and analytical. Some may consider representational bias a form of implementational bias, given that the representation is expressed through implementation; the author takes the view that the choice of representation can be chosen independently of the rest of the implementation, although it may have a significant effect on that implementation, and has such a different sort of effect on the problem that it should be considered a seperate sort of bias.

Representational biases are fundamental to any experiment in learning. Representation of knowledge in humans is an enormous field, ending in an open question. Hence there is no definitive standard for the way to represent information for a GA. Furthermore, it is clear that the representation of information is a vital part of how learning can be performed. Some representations may leave out important but unforeseen bits of information, whereas others may obscure important information in noise.

At the most basic level, representation of information in memory governs what can be stored, and how it can be transformed. For example, in Axelrod's GAs, the decisions the GA should make are stored in a linear array, indexed by the current state of the system. This means that related info ends up stored near related info, and the data is modular, resulting in a representation amenable to crossover. In [Angeline & Pollack 1993], storage as lists allows more complex operations such as subtree swapping and subroutine creation.

Implementational bias has to do with the actual program used to test ones ideas. There are an enormous number of these, given that no one really knows what they're aiming for - there is no one central "Genetic Algorithmic System" that everyone merely implements. There are numerous design decisions to be made in any program; perhaps more in a GA system. These include things that most designers save out as variables - number of GAs, and so on - as well as a number of more subtle issues. For example, the mechanism whereby GAs are selected to compete is often chosen arbitrarily, but obviously must have an effect on the results. This has been studied before, but only briefly, and usually as an off-hand justification for a design decision. For example, Axelrod states in Chapter 3 of [Axelrod84]: "...the asexual runs were only half as likely to evolve populations in which the median member was substantially more effective than TIT FOR TAT." This is the only discussion of different mating schemes in the entire chapter on design.

Analytical bias is perhaps the most insidious. Most experimenters start out with a fair idea of the results they are aiming for, and end up debugging their systems until these results are achieved. This tweaking is rarely well-documented by the authors of a system; only occasionally do authors admit the process. But in addition to tweaking the system, which could mostly be considered implementational bias, the experimenters often cast their output in a directed light, whether intentionally or not. Expected results are given precedence in analysis, unless the unexpected is truly interesting. Runs are cut short when the desired result is achieved, or when "the system seems stable", or other arbitrary criteria. Authors often state that they have let the system run for 10,000 generations, or 500,000, or 500,000,000, with no discussion of where these numbers came from, or what happened along the way.

Implementation

To examine the effect of these biases the author has created a system that allows altering the implementational biases. Due to the limited scope of this project, no attempt was made to implement alternative representations of the GAs; analytical biases are mostly independent of the system, and are very hard to alter. As will be seen later, however, the open state of the system will hopefully serve to ameliorate analytical bias.

In GAs, the designer must make a number of choices as to the actual mechanisms to use in their implementation of the GAs. Mechanisms for creation, alteration, procreation, and selection of GAs for mating and competition must all be decided upon. Usually, the designers select a strategy for each of these, and justify their choice based on biological analogy. For example, crossover, the most popular mechanism for procreation involving two individuals, is supposedly based on similar methods in natural organisms using DNA. But the designers usually have very little concern for the effects these choices have on the results.

To determine exactly what these effects are, the author's system implements all the popular mechanisms for handling GAs. Table 1 summarizes the options available, which are explained afterwards:

Table 1 - Mechanisms implemented
Creation Random; Hall Of Fame
Competition Random Pairs; Round-Robin; Single-elimination tournament; Proximity
Selection Best fraction; Fitness lottery
Mating Budding; Crossover
Other variables Population size; number of games; length of history kept; visibility of opponent's previous history; allowing self-mating; mutation rate; number of epochs

Table 1 - Mechanisms implemented
Creation	Random; Hall Of Fame
Competition	Random Pairs; Round-Robin; Single-elimination tournament; Proximity
Selection	Best fraction; Fitness lottery
Mating	Budding; Crossover
Other variables	Population size; number of games; length of history kept; visibility of opponent's previous history; allowing self-mating; mutation rate; number of epochs

Some of these options require explanation. Creation from the Hall Of Fame means that new GAs are created by budding or crossover from a fixed set of GAs specified by the experimenter when the program is run. Random-pairs competition means that each GA is matched up with exactly one other GA to do battle; Round-robin involves every GA meeting every other GA; the tournament has the GAs matched randomly, as in random-pairs, with the winner of each meeting advancing to another round of competition, until there is only a single GA remaining; proximity means that GAs meet other GAs 'near' them, as determined by coordinates stored in the GA, which are unchanging except under reproduction.

The fitness lottery means that each GA has a probability of reproducing equal to its proportional fitness, that is, its fitness relative to all other GAs. The best fraction method randomly mates GAs whose fitness rates in the upper fraction of all fitnesses; this fraction can be seleced by the user. Both of these methods can be used to fill a certain fraction of the genetic pool; the remaining portion is filled with newly created GAs.

Using the program

At the beginning of each run, the experimenter selects the mechanism he wishes to use for that run. For example, the experimenter might run the program with the following options:

java alex.ga.World -c hall-of-fame -m top-frac -f 0.5 -c -s round-robin -l 3 -p 80

which would duplicate Axelrod's system - 80 GAs, created from mutations of a given set of hall-of-fame GAs, competing in a round-robin format, with the best half reproducing by crossover to re-populate the pool. As is the standard with Unix programs, the full list of options can be obtained by specifying the "-h" option.

Once the run is underway, the program logs a good deal of information to the screen; this information can be redirected to disk. In an effort to defray analytical bias, almost all information is recorded. Furthermore, since the foremost source of analytical bias appears to be the experimenter's interpretation of the data, all data will be available for perusal from the author's web site.

Analysis tools

Analyzing the data from a GA system is often extremely difficult; at each generation, the system consists of a large number of complex programs. Obviously, the experimenter cannot keep track of all this information at each step. To aid in extracting useful information from this morass of data a number of summary tools are built into the system. Obviously, by summarizing the data these tools obscure an enormous amount of important info; but they allow the user to get the flavor of what is going on. To allow reinterpretation of the data by subsequent analysis, all the information available at each step is logged to disk.

The simplest analysis tool merely displays the [geno|pheno]type of the GA with the highest fitness. This is only slightly useful. The next tool, slightly more heavy-weight, displays an average behavior, computed either by evenly averaging across all GAs, or by taking an average weighted by fitness. If the population is beginning to converge, this will indicate that, but is otherwise also mostly useless. The third tool available tracks the distribution of [g|ph]enotypes; since there are a limited number of these available, and it is easy to figure out which one a specific GA belongs in, constructing a distribution for the GAs is simple and useful. It allows the experimenter to see which types of GAs are starting to take over a population; examining this distribution over time reveals surges of different types of GAs as the simulation progresses.

Analysis

To test out the system and the theories, the author first tried a base case of the system, duplicating as closely as possible Axelrod's original experiment. The system was allowed to run for 1000 epochs (generations). This was a nice round number; with no experience with the system, the author picked this number basically out of the air. After 1000 epochs, a number of algorithms started dominating the playing field.

The next order of business, once the system's pieces appeared to be in order, was to perform comparison experiments using the variety of mechanisms and variables available. The first such experiment was chosen to be purposely drastic, as viewed through the author's bias. The two systems chosen compared the tournament method of matching up GAs with a round-robin arrangement. All other variables were kept the same, as shown in Table 2.
Table 2 - Comparison runs performed
java alex.ga.World -p 64 -l 3 -L -m round-robin java alex.ga.World -p 64 -l 3 -L -m tournament

Table 2 - Comparison runs performed
`java alex.ga.World -p 64 -l 3 -L -m round-robin`	`java alex.ga.World -p 64 -l 3 -L -m tournament`

Obviously, the round-robin approach took much more time; at each epoch, round-robin involves n^2 matches, whereas tournament involves only n-1.

Future Work

This is only a very limited project so far; the author hopes to extend both the program and the analysis tools further in the months ahead. In addition, it is hoped that by making the basic program an available and extendible core, other programmers will be able to use it and get results. With the advent of the huge number of users available via the World Wide Web, a sort of distributed experimentation is possible. With numerous and quite possibly anonymous experimenters using the program, contributing raw data from runs of the program, and information about design of the program and analysis of the program's output.

The first step to take in this direction is to make the program slightly easier to use, the better to entice users to use it. A trivial step in this direction is to make it available as an applet, with the raw data available both to the experimenter and to some central server set up by the author. There are all sorts of privacy and security issues to be worked out in this sort of arrangement, and discussing it fully is beyond the scope of this paper.