ncu FY2012 Annual Report 12
![ncu FY2012 Annual Report 12](/sites/default/files/styles/embed_lg_1x/public/2024-03/ncu_fy2012stefan2012a.png?itok=nDvNXr49)
Figure 11: The average reward computed over every 100 episodes and 20 simulation runs, for scaled FERL (left panel), FERL (middle panel), and NNRL (right panel). The line colors correspond to the settings of the parameter of the exploration strategy (red: 0.01, green: 0.001, and blue: 0.0005) and the line types correspond to the setting of the initial parameter of the exploration strategy (solid: 0.5, dashed: 1, and dotted: 2).
Date:
04 March 2024
Copyright OIST (Okinawa Institute of Science and Technology Graduate University, 沖縄科学技術大学院大学). Creative Commons Attribution 4.0 International License (CC BY 4.0).