ncu FY2012 Annual Report 14

ncu FY2012 Annual Report 14

Figure 13:  Environment and the results of optimization. Left: The environment used in the simulation. Center: The evolution of the fitness. The horizontal axis is the fitness, the vertical axis is the number of generation. The lines represent the average fitness and the shaded areas represent the standard deviation. Blue represents the result of ExQ-learning (state-dependent discounting) and Red represents that of Q-learning (constant discount factor). Right: The average of the obtained discount function after 50 evolutions. The colors in the figure represent the magnitude of the discount factor at the corresponding state. The colors in small cells are the colors at the corresponding state in the left figure (Red and Green states).

Date:
04 March 2024
Share on: