Parameter Adjustment: Learning with a hint

Our first experiment involves learning through practice, giving the computer a hint: When a collision happens, the algorithm will raise the weight value of the sensor that pointed in the crash direction. Random variation is added to all weights.

The use of this explicit sensor-direction coupling creates a successful learning paradigm. In the beginning, the blank player will promptly suicide by crashing blindly into a nearby wall. But after some 20 games it will be able to play a good game against a human or a different artificial Tron player.

This scenario leaves us however with the feeling that almost all that was to be learned was really known from the beginning. All the algorithm has done is adjusting the parameters to an equilibrium value.

There might be a lesson here: Learning works really well when done in little steps. For a computer to learn a huge task we must set up the conditions for it to be done one small step at a time.

Suicide: Starting without any knowledge,
an artificial Tron player loses by
crashing against himself on the very
first move.

Learning as parameter adjustment. After
playing just eleven games, the blue player
has found a weakness in man-made Avoider
(black) and wins for the first time