Test Against Humans

Next: The Huge Round-Robin Agent Up: Evolving Agents Without Human Previous: Tuning up the Novelty

Test Against Humans

To verify the hypothesis that selecting against humanity is not irrelevant, we selected a group of 10 players produced by the control experiment, and introduced them manually in the main population, to have them tested against humans. We ran this generation (no. 250) for longer than our usual generations, to get an accurate measurement.

Table 3.2: Evaluation of control agents (evolved without human intervention) after being introduced into the main population, and evaluated against humans. A robot's performance against our evaluation set does not predict how it will measure up against humans. As a coevolving population wanders through behavior space, it finds both good and bad players.

Control	Performance	Statistical	Percent
generation	vs evaluation	strength	of robots
no.	set (% wins)	(RS)	below
360	10.0	-4.7	0.1
387	46.7	0.4	80.6
401	54.4	-0.2	59.7
354	61.1	0.4	80.0
541	63.3	0.1	70.5
462	66.7	0.1	67.8
570	70.0	-0.2	60.0
416	75.6	-1.0	40.7
410	78.9	0.4	80.3
535	96.7	0.3	77.2

Table 3.2 summarizes the result of this test. A group of 10 robots was chosen, each one the best from one of the 600 generations that group B (t=100) ran for. We chose the one that performed worst against the evaluation set (generation 360) and the one that performed best (gen. 535), along with eight others, chosen by their different performances vs. the evaluation set.

The last column of the table shows how these robots compare, as measured by their performance against humans (RS) with all other ranked robots.

From the internal point of view of robot-robot coevolution alone, all these agents should be equal: all of them are number one within their own generation. If anything, those of later generations should be better. But this is not the case, as performance against a training set suggests that after 100 generations the population is wandering, without reaching higher absolute performance levels. This wandering is also occurring with respect to the human performance space.

We conclude that a coevolving population of agents explores a subspace of strategies that is not identical to the subspace of human strategies and consequently the coevolutionary fitness is different from the fitness vs. people. Without further testing vs. humans, self play alone provides a weaker evolutionary measure.

Next: The Huge Round-Robin Agent Up: Evolving Agents Without Human Previous: Tuning up the Novelty

Pablo Funes
2001-05-08