標題: | 應用動態規劃與隨機近似於加強式學習控制系統 Dynamic Programming And Stochastic Approximation As Applied To Reinforcement Learning Control Systems |
作者: | 孟琬瑜 Meng, Wan-Yua 周志成 Jou, Chi-Cheng 電控工程研究所 |
關鍵字: | 加強式學習;總效益指標 |
公開日期: | 1994 |
摘要: | 本論文的內容著重於使用兩個加強式學習的方法,切線法與割線法,於非線性學習控制設計。控制的工作是化成一個連續最佳化的問題,所提出的線上學習演算法可處理在隨機的環境中運作、具有未知動態行為的系統,而且總效益指標是表示為一無限長時間範圍。這兩個演算法是直接的設計方法,並結合了動態規劃與隨機近似的技巧,具備完整性與一般化的特質,所以可以用各種運算模式來組成控制器。我們評定應用這兩個方法於一穩定化問題的成果與效率,並發現使用割線法比切線法好。其它的模擬結果顯示,對於具有未知的動態行為、延遲的加強訊號、及長時間效益的控制問題,加強式學習是一種有效的解決方法。 This thesis is focused on nonlinear learning control design using two reinforcement learning schemes, the tangent and secant methods. The control task is formulated into a sequential optimization problem. The proposed on-line learning algorithms treat systems with unknown nonlinear dynamics operating in a stochastic environment, and the overall performance index is formulated over an infinite time horizon. The algorithms are direct methods and emerge as a synthesis of techniques from dynamic programming and stochastic approximation. The algorithms are complete and general enough so that the controller can be constituted by various computing models. We justify the effectiveness and efficiency of the two schemes on a stabilization problem. It is suggested that the secant method outperforms the tangent method. Other simulation results demonstrate that reinforcement learning scheme is an effective alternative for control problems with unknown dynamics, delayed reward, and long term performance. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT833327021 http://hdl.handle.net/11536/59865 |
Appears in Collections: | Thesis |