ncu FY2012 Annual Report 14
![ncu FY2012 Annual Report 14](/sites/default/files/styles/embed_lg_1x/public/2024-03/ncu_fy2012yoshida2012a.png?itok=hXq9zrL9)
Figure 13: Environment and the results of optimization. Left: The environment used in the simulation. Center: The evolution of the fitness. The horizontal axis is the fitness, the vertical axis is the number of generation. The lines represent the average fitness and the shaded areas represent the standard deviation. Blue represents the result of ExQ-learning (state-dependent discounting) and Red represents that of Q-learning (constant discount factor). Right: The average of the obtained discount function after 50 evolutions. The colors in the figure represent the magnitude of the discount factor at the corresponding state. The colors in small cells are the colors at the corresponding state in the left figure (Red and Green states).
Copyright OIST (Okinawa Institute of Science and Technology Graduate University, 沖縄科学技術大学院大学). Creative Commons Attribution 4.0 International License (CC BY 4.0).