Again I am explaining the evolutionary logic in the reinforcement learning. Here we should investigate the evolutionary logic.
This is picture from Intel AI. They created an AI system as named CERL.
So there is five steps
Initialization is the first step of our evolutionary algorithm. We need a initial population of solution. Yes the true word of initial population of solution and we need good initial populations to reach maximum or maximum like states. Some of the initial states are key to reach the physically maximum state. It is another issiue to add which initial states must be added or not.
The main thing is to find the initial starter of the game space. These are possible dimensions of the game space and every possilibty is recured from this particles. You can think these dimensions as particles in our space. Everything, every material is comprised from these particles. Thus, if we can determine these particles, we can shape and structure our game space.
It is good problem though it is hard to find extinstential state.
Yet, we can find particles with different methods in our physical world. There could be an algorithm to search and find dimensions from inner state to outer state.
Selection is another phase of systematical architechture. We should select the best for what? For what purpose. In humanity, the selection phase can change to environment and the aim of the ambiance. There could be different combinations of environment and changing environment, selection could be the change according to this.
There can be bendy road race and straight road race and according to model, one car model drives good at bendy and drives bad at straight road other is reverse of it.
There can be many phase of this steps and we must optimize the steps. Therefore, selection mechanism is the system which select the maximum treating agent from population of agents
Selection could be done to a group of agents or models. We can alter the some functions with some probable mutate-actions or we can crossover these group of models.
We give importance some dimensions in some game space and give importance other dimensions to check the importance of dimensions physically.
We must terminate the simulation when we mathematically think it is the best optional state we can reach. There is a nature of simulation so its behavior will tell us to stop. We can measure this by reward function and with time seris analysis, if reward function descends, that is where we must stop.