應用動態規劃與隨機近似於加強式學習控制系統

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	孟琬瑜	en_US
dc.contributor.author	Meng, Wan-Yua	en_US
dc.contributor.author	周志成	en_US
dc.contributor.author	Jou, Chi-Cheng	en_US
dc.date.accessioned	2014-12-12T02:14:25Z	-
dc.date.available	2014-12-12T02:14:25Z	-
dc.date.issued	1994	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#NT833327021	en_US
dc.identifier.uri	http://hdl.handle.net/11536/59865	-
dc.description.abstract	本論文的內容著重於使用兩個加強式學習的方法，切線法與割線法，於非線性學習控制設計。控制的工作是化成一個連續最佳化的問題，所提出的線上學習演算法可處理在隨機的環境中運作、具有未知動態行為的系統，而且總效益指標是表示為一無限長時間範圍。這兩個演算法是直接的設計方法，並結合了動態規劃與隨機近似的技巧，具備完整性與一般化的特質，所以可以用各種運算模式來組成控制器。我們評定應用這兩個方法於一穩定化問題的成果與效率，並發現使用割線法比切線法好。其它的模擬結果顯示，對於具有未知的動態行為、延遲的加強訊號、及長時間效益的控制問題，加強式學習是一種有效的解決方法。	zh_TW
dc.description.abstract	This thesis is focused on nonlinear learning control design using two reinforcement learning schemes, the tangent and secant methods. The control task is formulated into a sequential optimization problem. The proposed on-line learning algorithms treat systems with unknown nonlinear dynamics operating in a stochastic environment, and the overall performance index is formulated over an infinite time horizon. The algorithms are direct methods and emerge as a synthesis of techniques from dynamic programming and stochastic approximation. The algorithms are complete and general enough so that the controller can be constituted by various computing models. We justify the effectiveness and efficiency of the two schemes on a stabilization problem. It is suggested that the secant method outperforms the tangent method. Other simulation results demonstrate that reinforcement learning scheme is an effective alternative for control problems with unknown dynamics, delayed reward, and long term performance.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	加強式學習	zh_TW
dc.subject	總效益指標	zh_TW
dc.title	應用動態規劃與隨機近似於加強式學習控制系統	zh_TW
dc.title	Dynamic Programming And Stochastic Approximation As Applied To Reinforcement Learning Control Systems	en_US
dc.type	Thesis	en_US
dc.contributor.department	電控工程研究所	zh_TW
顯示於類別：	畢業論文