Independently learning AI pong player

In this experiment a genetic algorithm is applied to evolve a neural network to play pong game. it isn't the most optimal implementation of the algorithm but it successfully demonstrates the process of non-supervised learning. The neural network functionality is powered by synaptic.js

Generation Population size Agents alive Score Highscore
0 5 5 0 0
Neural network-controlled agents playing pong. Fitness of the best performing agent in a population is shown in the graph.

How does it work?

Genetic algorithm mimics the evolution of species in the nature. The algorithm is looping through a number of steps:

  1. A generation of independent agents is bred
  2. Each agent has a neural network to control the movement
  3. Agents are evaluated by their performance in the game
  4. Best performing agents breed the next generation

Eventually the agents evolve as the best performing agents' behaviour is inherited through the process of cloning and mutation.

Local maximum valleys

The development of the fitness doesn't necessarily follow a linear path as one approach might have limited potential thus reaching a local maximum at most. Agent's strategy might be successful only under certain conditions and therefore promising family tree might suddenly extinct.

There is likely a theoretical maximum fitness an agent may reach given any possible state of the game. Occasional breeding of agents with completely random neural networks enable the escape from a potential local maximum valley by introducing a fresh start for the development.

About neural networks

Wikipedia:

An (artificial) neural network is a network of simple elements called neurons, which receive input, change their internal state (activation) according to that input, and produce output depending on the input and activation.

This simulation applies a type of feedforward networks where neurons are formed in three layers:

In general neurons could be connected in arbitrary ways between each other to provide different behaviour. In this case an all-to-all connection structure is used meaning that all neurons in the input layer are connected to the neurons in the hidden layer and finally all neurons in the hidden layer are connected to the output layer.

Agents have senses but they may not know what to do with them

The game is structured so that there's a single game loop method that is executed on every frame, approximately 60 times a second. The loop is doing two things: updating the game state (ie. calculates new positions for the ball and the agents) and rendering the updated game state on the screen. In the update, each network is activated by feeding a normalized input to the network which in return gives an output. The output is intepreted and applied to determine which way an agent should move next.

Agents need enough information about the game state to be able to play the game. The following set of values is fed to the networks's input layer:

  1. Agent's horizontal (x) position
  2. Ball's horizontal (x) position
  3. Ball's vertical (y) position
  4. Ball's horizontal (x) velocity
  5. Ball's vertical (y) velocity

The output neuron's value is evaluated and applied to steer the agent. Values below 0.5 are interpreted as a command to move left and values greated than 0.5 are interpreted as command to right.

Activation of fast forward network is simple math

The way feedforward neural network determines the output from the input is a result of lots of multiplied sums. Each neuron have bias value which can be though as importance of that neuron. Each connection between two neurons has weight value which determines how important that connection is for the end result.

Neural networks are often designed to perform best when input values are normalized meaning that the values are between 0-1. This simulation uses normalised input except for the agent's and ball's x-position. This is done to increase the significance of obvious input values among the less important values, leading to faster early stage development of the agents. One could consider that as giving a brief lecture to the agents to put extra focus to the horizontal position of the game elements.