基於安全性增強式學習之循序擾動學習演算法

標題:	基於安全性增強式學習之循序擾動學習演算法 Safe Reinforcement Learning based Sequential Perturbation Learning Algorithm
作者:	何長安 Ho, Chang-An 林昇甫 Lin, Sheng-Fuu 電控工程研究所
關鍵字:	安全性增強式學習;權值擾動;循序搜尋;safe reinforcement learning;weight-perturbation;sequential search
公開日期:	2008
摘要:	本論文係利用循序搜尋之概念進行類神經網路架構中的所有權值進行添加擾動量的動作，提出了一循序權值擾動於安全性增強式學習架構。並於擾動量添加完成後，對於添加擾動量前後進行優劣性評價，藉此達至權值更新動作。避免傳統擾動學習演算法易落入局部解或於解空間中某解附近產生振盪現象，而導致學習速度趨緩之問題。此外，於增強式學習架構中，利用受控體的能量概念定義學習目標狀態集合，透過此設計可大幅降低傳統增強式學習於解空間中過度搜尋較佳解之時間，即能迅速將受控體狀態控制於目標狀態集合中。於測試模擬中，利用n質量單擺系統模型進行人型機器人模擬測試，藉此證實本論文所提出的學習演算法效能表現較為彰顯。 This article is about sequential perturbation learning architecture through safe reinforcement learning (SRL-SP) which based on the concept of linear search to apply perturbations on each weight value of the neural network. The evaluation of value of function between pre-perturb and post-perturb network is executed after the perturbations are applied, so as to update the weights. Applying perturbations can avoid the solution form the phenomenon which falls into the hands of local solution and oscillating in the solution space that decreases the learning efficiency. Besides, in the reinforcement learning structure, use the Lyapunov design methods to set the learning objective and pre-defined set of the goal state. This method would greatly reduces the learning time, in other words, it can rapidly guide the plant’s state into the goal state. During the simulation, use the n-mass inverted pendulum model to perform the experiment of humanoid robot model. To prove the method in this article is more effective in learning.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT079612518 http://hdl.handle.net/11536/41837
顯示於類別：	畢業論文