In this experiment a genetic algorithm is applied to evolve a neural network to play pong game. it isn't the most optimal implementation of the algorithm but it successfully demonstrates the process of non-supervised learning. The neural network functionality is powered by synaptic.js
Generation | Population size | Agents alive | Score | Highscore |
---|---|---|---|---|
0 | 5 | 5 | 0 | 0 |
Genetic algorithm mimics the evolution of species in the nature. The algorithm is looping through a number of steps:
Eventually the agents evolve as the best performing agents' behaviour is inherited through the process of cloning and mutation.
The development of the fitness doesn't necessarily follow a linear path as one approach might have limited potential thus reaching a local maximum at most. Agent's strategy might be successful only under certain conditions and therefore promising family tree might suddenly extinct.
There is likely a theoretical maximum fitness an agent may reach given any possible state of the game. Occasional breeding of agents with completely random neural networks enable the escape from a potential local maximum valley by introducing a fresh start for the development.
Wikipedia:
An (artificial) neural network is a network of simple elements called neurons, which receive input, change their internal state (activation) according to that input, and produce output depending on the input and activation.
This simulation applies a type of feedforward networks where neurons are formed in three layers:
In general neurons could be connected in arbitrary ways between each other to provide different behaviour. In this case an all-to-all connection structure is used meaning that all neurons in the input layer are connected to the neurons in the hidden layer and finally all neurons in the hidden layer are connected to the output layer.
The game is structured so that there's a single game loop method that is executed on every frame, approximately 60 times a second. The loop
is doing two things: updating the game state (ie. calculates new positions for the ball and the agents) and rendering the updated game state on the screen.
In the update, each network is activated by feeding a normalized input to the network which in return gives an output. The output is intepreted and applied to determine which way an agent should move next.
Agents need enough information about the game state to be able to play the game. The following set of values is fed to the networks's input layer:
The output neuron's value is evaluated and applied to steer the agent. Values below 0.5 are interpreted as a command to move left and values greated than 0.5 are interpreted as command to right.
The way feedforward neural network determines the output from the input is a result of lots of multiplied sums. Each neuron have bias value which can be though as importance of that neuron. Each connection between two neurons has weight value which determines how important that connection is for the end result.
Neural networks are often designed to perform best when input values are normalized meaning that the values are between 0-1. This simulation uses normalised input except for the agent's and ball's x-position. This is done to increase the significance of obvious input values among the less important values, leading to faster early stage development of the agents. One could consider that as giving a brief lecture to the agents to put extra focus to the horizontal position of the game elements.