完整後設資料紀錄
DC 欄位語言
dc.contributor.author孟琬瑜en_US
dc.contributor.authorMeng, Wan-Yuaen_US
dc.contributor.author周志成en_US
dc.contributor.authorJou, Chi-Chengen_US
dc.date.accessioned2014-12-12T02:14:25Z-
dc.date.available2014-12-12T02:14:25Z-
dc.date.issued1994en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#NT833327021en_US
dc.identifier.urihttp://hdl.handle.net/11536/59865-
dc.description.abstract本論文的內容著重於使用兩個加強式學習的方法,切線法與割線法,於非線性學習控制設計。控制的工作是化成一個連續最佳化的問題,所提出的線上學習演算法可處理在隨機的環境中運作、具有未知動態行為的系統,而且總效益指標是表示為一無限長時間範圍。這兩個演算法是直接的設計方法,並結合了動態規劃與隨機近似的技巧,具備完整性與一般化的特質,所以可以用各種運算模式來組成控制器。我們評定應用這兩個方法於一穩定化問題的成果與效率,並發現使用割線法比切線法好。其它的模擬結果顯示,對於具有未知的動態行為、延遲的加強訊號、及長時間效益的控制問題,加強式學習是一種有效的解決方法。zh_TW
dc.description.abstractThis thesis is focused on nonlinear learning control design using two reinforcement learning schemes, the tangent and secant methods. The control task is formulated into a sequential optimization problem. The proposed on-line learning algorithms treat systems with unknown nonlinear dynamics operating in a stochastic environment, and the overall performance index is formulated over an infinite time horizon. The algorithms are direct methods and emerge as a synthesis of techniques from dynamic programming and stochastic approximation. The algorithms are complete and general enough so that the controller can be constituted by various computing models. We justify the effectiveness and efficiency of the two schemes on a stabilization problem. It is suggested that the secant method outperforms the tangent method. Other simulation results demonstrate that reinforcement learning scheme is an effective alternative for control problems with unknown dynamics, delayed reward, and long term performance.en_US
dc.language.isozh_TWen_US
dc.subject加強式學習zh_TW
dc.subject總效益指標zh_TW
dc.title應用動態規劃與隨機近似於加強式學習控制系統zh_TW
dc.titleDynamic Programming And Stochastic Approximation As Applied To Reinforcement Learning Control Systemsen_US
dc.typeThesisen_US
dc.contributor.department電控工程研究所zh_TW
顯示於類別:畢業論文