Next: 5.2 Overall Performance Characteristics Up: 5. Experimental Analysis Previous: 5. Experimental Analysis

5.1 Methodology

The test-bed system is written in Allegro Common Lisp and contains 25,000 lines of source code, which produces 2.5MB of compiled code. The system solves both individual MW problems and sequences of them. Individual problems are constructed by randomly selecting subsets from the pool of permanent MW objects (actors, hand-trucks, and trucks) that will be active for that problem. Then a random group of boxes and locations is constructed and a list of goals involving them is generated. For these experiments, a database was created of 60 problems, whose goals were always to move all boxes to the truck. The number of boxes was uniformly distributed between 3 and 5.

The experiments are designed to control as many sources of randomness as possible in order to make comparisons meaningful. In particular, community performance can be strongly effected by the outcome of random decisions made by either the actor (e.g., selecting unmet goals to work on) or the system (e.g., when determining the outcome of resource conflicts). The starting random seed value for each problem can be stored and reused, but since the order in which the actors face these decisions may vary, this is not sufficient. Also, each randomly-decided decision point is stored together with a sequence of decisions. The first time the decision needs to be made, the first item in the list gives the result. The second time the decision needs to be made (during the course of the same activity), the second item gives the result, et cetera.

In addition to concerns about the influence of random decisions on the results of individual problems, there are concerns about the the influence of problem ordering on the shapes of learning curves. It is not feasible to determine learning curves by running the system on all possible permutations of the database problems, so the system is run on four predetermined groups of sequences. Each group of sequences is balanced in the following way: each of the database problems occurs once as the first problem of some sequence in the group, once as the second of a different sequence in the group, et cetera. So each data point shown is the result of solving each of the 60 database problems four times, using different sets of seeds and decisions each time; these 240 runs were repeated for both the baseline system and when the actors were learning conventions.


Next: 5.2 Overall Performance Characteristics Up: 5. Experimental Analysis Previous: 5. Experimental Analysis
Last Update: March 10, 1999 by Andy Garland