New Fitness Measure for the Main Population

The logical next step was to implement a new fitness function based on our improved performance measurement. We decided to compute the RS strength of all robots (not just those currently ``active'' on the population) at the end of each generation. Agents who had won all their games would be assigned an RS of $+\infty$ , so they would always be selected.

**Figure 3.19:** Original fitness function vs. statistical strength. (a) fitness value vs. RS for all agents. A group of agents has reached extremely high fitness values even though they are not so special in terms of performance. (b) zooming in on the graph the ``main sequence'' is apparent. RS and fitness are correlated. (c) the top 100 players according to the fitness formula are different from the top 100 according to the RS.
$\resizebox*{0.7\textwidth}{!}{\includegraphics{newfit/ts2f24.eps}}$

Beginning at generation 397 (which corresponds to game no. 233,877) the main Tron server computes RS for all players and chooses the top 90 to be passed on to the next generation. So step 4 of the main Tron loop (page

) is replaced with

The present section analyzes results of nearly a year of the new configuration, spanning up to game no. 366,018. These results include the new configuration of the novelty engine, which produced better new rookies (starting at game no. 144,747) , and the upgraded fitness function -- based on paired comparison statistics (starting at game no. 233,877).

Fig. 3.20 briefly shows that the system continued learning; both the winning ratio (WR) and the relative strength (RS) went up, whereas the combined human performance stayed about the same.

**Figure 3.20:** Results obtained with the new fitness configuration (beginning at game 234,000) show that the system continued learning. From left to right, top to bottom: Win Rate of the system, RS of the system, RS of the system as compared with humans below, and RS of human population. Compare with figs. 3.7, 3.12, 3.13 and 3.16, respectively.
$\resizebox{0.48\textwidth}{!}{\includegraphics{aftermath2/ts2f09.eps}}$ $\resizebox{0.48\textwidth}{!}{\includegraphics{aftermath2/ts2f06.eps}}$ $\resizebox{0.48\textwidth}{!}{\includegraphics{aftermath2/ts2f07.eps}}$ $\resizebox{0.48\textwidth}{!}{\includegraphics{aftermath2/ts2f08.eps}}$

Fig. 3.21 shows that new humans kept coming with varying strengths, whereas new agents are better since the change of regime on the novelty engine. But there is also a curious flat ``ceiling'' to the agent's strength graph. In fact this is produced by the selection mechanism: Any agent evaluated above that cutoff will be selected to be in the main population, and kept playing until reevaluation puts it below the top 90.

**Figure 3.21:** Performance of new humans and new agents along time (compare to fig. 3.9) The new configuration of the novelty engine starts producing better robots beginning at robot no. 2500. The flat ceiling of agent's strengths is produced because we are using the same tool for fitness and measurement.
$\resizebox*{0.7\textwidth}{!}{\includegraphics{aftermath2/ts2f04.eps}}$

The main result of this new setup is showing that we have restored a good selection mechanism. This is visible in fig. 3.22: In this graph we have plotted the performance of the system-as-a-whole along with the average strength of new robots being produced at the same time.

**Figure 3.22:** Performance of novice robots vs. performance of system as a whole. The robot production engine (broken line) has three stages corresponding to three evolutionary setups. During the first 40,000 games, novices are being produced in short evolutionary runs (20 generations on average). Between 40,000 and 148,000, evolutionary runs are longer (100 generations) and tournaments use 15 fixed champions and 10 evolving champions. From 148,000 and onwards, 24 evolving champions are used for coevolutionary fitness. The increased performance of the system as a whole (solid line) is higher than the average new robots as a results of the selection process. The system stagnated between 148,000 and 220,000 games, when the older fitness function failed to select the better agents. The new statistical fitness function restores the proper behavior (games 220,000-366,000).
$\resizebox*{0.7\textwidth}{!}{\includegraphics{aftermath2/ts2f26.eps}}$

The difference between both curves demonstrates the effects of the survival of the fittest brought up by the main Tron server: the system as a whole performs better than the average agent.

There is an important increase on the quality of those rookies after game 170,000, with the elimination of the fixed training set and the raise in the number of generations of the novelty engine. At this point, the deficiencies of the original fitness function are evident; between game no. 170,000 and 220,000 there is no performance increase due to selection.

Finally, beginning at game 220,000, selection based on relative strength pushed up once more the performance of the system.

It is too soon to tell whether or not the performance of the Tron system will continue to improve beyond the current state. We feel that we might have reached the limits of agent quality imposed by the representation used.