-
-
Notifications
You must be signed in to change notification settings - Fork 19
The single pole balancing experiment
The pole-balancing or inverted pendulum problem has long been established as a standard benchmark for artificial learning systems. It is one of the best early examples of a reinforcement learning task under conditions of incomplete knowledge.
The pole-balancing problem can be visualized as follows:
As you can see from the scheme above, the physical system has the following constraints:
- The pole must remain upright within ±r the pole failure angle
- The cart must remain within ±h of origin
- The controller must always exert a non-zero force Fx
Where r is a pole failure angle (±12 ̊ from 0 ̊) and h is a track limit (±2.4 meters from the track centre).
The simulation of the cart balancing terminated when either the following conditions were met: the pole inclines over the failure angle, or the cart position is outside of the track.
The experiment's goal is to devise a controller that can keep the pole balanced for a defined number of simulation steps. The controller always applies to the cart a force at full magnitude in either direction (bang-bang control). The successful controller should simulate a single-pole balancing for at least 500'000 time steps (10'000 simulated seconds).
To run this experiment with a population size equal to 150 organisms, execute the following commands:
cd $GOPATH/src/github.com/yaricom/goNEAT
go run executor.go -out ./out/pole1 -context ./data/pole1_150.neat -genome ./data/pole1startgenes -experiment cart_pole
The above command will execute 100 trials of a single pole-balancing experiment within 100 generations with a population of 150 organisms. The executor will save the results of the experiment in the output directory.
The console output will look similar to the following:
Solved 100 trials from 100, success rate: 1.000000
Random seed: 1620479404
Average
Trial duration: 221.247774ms
Epoch duration: 39.216614ms
Generations/trial: 10.9
Champion found in 83 trial run
Winner Nodes: 7
Winner Genes: 10
Winner Evals: 634
Diversity: 47
Complexity: 16
Age: 3
Fitness: 1.000000
Average among winners
Winner Nodes: 7.1
Winner Genes: 11.5
Winner Evals: 1557.2
Generations/trial: 10.9
Diversity: 59.940000
Complexity: 18.600000
Age: 3.570000
Fitness: 1.000000
Averages for all organisms evaluated during experiment
Diversity: 37.594982
Complexity: 18.064586
Age: 2.396373
Fitness: 0.171238
Efficiency score: 11.135725
>>> Start genome file: ./data/pole1startgenes
>>> Configuration file: ./data/pole1_150.neat
As you can see from the statistics above, the NEAT algorithm is extremely effective in finding a successful solver for pole balancing. It was able to find a solver in all trials compared to the XOR problem, where only about 90 percent of the trials were solved.
It is interesting to find how the population size affects the performance of the algorithm. To find this, we are going to run this experiment using a population of 1'000 organisms. Use the following command to run mentioned evaluation:
cd $GOPATH/src/github.com/yaricom/goNEAT
go run executor.go -out ./out/pole1 -context ./data/pole1_1000.neat -genome ./data/pole1startgenes -experiment cart_pole
The commands above produce the following console outputs:
Solved 100 trials from 100, success rate: 1.000000
Random seed: 1620480184
Average
Trial duration: 296.815693ms
Epoch duration: 142.432241ms
Generations/trial: 2.5
Champion found in 23 trial run
Winner Nodes: 7
Winner Genes: 10
Winner Evals: 2543
Diversity: 40
Complexity: 16
Age: 2
Fitness: 1.000000
Average among winners
Winner Nodes: 7.0
Winner Genes: 10.3
Winner Evals: 1936.6
Generations/trial: 2.5
Diversity: 64.660000
Complexity: 17.170000
Age: 1.450000
Fitness: 1.000000
Averages for all organisms evaluated during experiment
Diversity: 26.504369
Complexity: 17.313629
Age: 1.120346
Fitness: 0.160989
Efficiency score: 11.480676
>>> Start genome file: ./data/pole1startgenes
>>> Configuration file: ./data/pole1_1000.neat
By comparing the number of generations per trial for both experiments (10.9 and 2.5), you can see that increasing population size effectively increases the algorithm's performance. Furthermore, with 1'000 organisms in the population of solvers, the solution is often found in the NULL, i.e., within the initial random population.
In both single pole-balancing configurations described above, the optimal genome of the winner organism consists of seven nodes and ten genes.
The seven network nodes have the following meaning:
- #1 is a bias
-
#2-5 are sensors receiving input system state in following order:
- cart's X position,
- cart's acceleration among X,
- pole's angle,
- pole's angular velocity.
-
#6 and #7 are output nodes signaling the action that should be applied to the system to balance a single pole at each simulation step, i.e., the force direction. The applied force direction depends on the relative strength of activations of both output neurons. If activation of the first output neuron (
6th node
) greater than activation of the second neuron (7th node
), then the positive force direction is applied. Otherwise, the negative force direction is applied.
The ten genes are precisely the number of links required to connect five input sensor nodes with two output neuron nodes (5x2).
Next, we will evaluate the algorithm against a more advanced variant of the pole balancing problem: Double-Pole Balancing.