標題: 以短暫差分及基因遺傳法則為基礎之增強式學習系統
Temporal Difference and GA-based Reinforcement Learning System
作者: 周鐘平
Jou Chong-Ping
林進燈
Lin Chin-Teng
電控工程研究所
關鍵字: 基因遺傳法則;短暫差分;類神經模糊控制器;混沌系統;Genetic algorithm;Temporal difference method;Neural fuzzy controller;Chaotic system
公開日期: 1999
摘要: 本論文提出一種以短暫差分(temporal difference)及基因遺傳法則(genetic algorithm) 為基礎的增強式(reinforcement)學習方法(TDGAR)以解決各種增強式學習的問題。此學習架構為一種新的混合式基因遺傳法則,它整合了短暫差分預測方法及基因遺傳法則來完成增強式學習的工作。在結構上此一學習系統是由兩個順向傳送式類神經網路所組成。其中一個類神經網路稱為評估網路(critic network)用來幫助另一類神經網路的學習,而另一類神經網路則稱為行動網路(action network),它決定了此學習系統的輸出動作;而此行動網路可以為一般的類神經網路或類神經模糊網路。使用短暫差分預測方法評估網路可以預測外部增強式信號 (external reinforcement signal)並提供一更可靠的內部增強式信號(internal reinforcement signal)給行動網路。行動網路利用基因遺傳法則根據內部增強式信號來調整自己以提供正確的輸出信號。此TDGAR學習系統的主要觀念是利用內部增強式信號來當做GA的適合度函數(fitness function)以使GA在不需要等到真正的外部增強式信號發生時找出適當的解來。這使得學習的速度得以加快。本文所提出的學習系統已經由電腦模擬控制在倒單擺車系統(cart-pole balancing system),磁浮軸承系統(magnetic bearing system),及混沌系統(chaotic system)上,結果顯示所提出的學習架構證明其可行性及應用性極佳。
This thesis proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method for solving various reinforcement learning problems.The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to fulfill the reinforcement learning task. Structurely, the TDGAR learning system is composed of two integrated feedforward networks.One neural network acts as a critic network for helping the learning of the other network, the action network, which determines the outputs (actions) of the TDGAR learning system, where the action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly even during the period without external reinforcement feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problems.Computer simulations on controlling of the cart-pole system, magnetic bearing system and chaotic system have been conducted to illustrate the performance and applicability of the proposed learning scheme. 1.1 Motivation 1.2 Literature Survey 1.2.1 Reinforcement Learning by using Neural Networks or Neural Fuzzy networks 1.2.2 Reinforcement Learning by Using GAs 1.2.3 Related Works 1.3 Thesis Outline 2 Genetic Algorithms and Temporal Difference Methods 2.1 Basics of Genetic Algorithms 2.2 Hybrid Genetic Algorithms 2.3 Temporal Difference Methods 3 TD and GA-based Reinforcement learning System 3.1 Structure of the TDGAR Learning System 3.1.1 The Critic Network 3.1.2 The Action Network 3.2 Learning Algorithm of the TDGAR Learning System 3.2.1 Learning Algorithm for the Critic Network 3.2.2 Learning Algorithm for the Action Network 4 Control of the Cart-Pole System 4.1 Description of the Cart-Pole System 4.2 Results and Discussion 5 Control of a Magnetic Bearing System 5.1 Description of the Active Magnetic Bearing System 5.2 Simulation Results 6 Controlling Chaos 6.1 Using Small Perturbations to Control Chaos 6.2 Controlling of Henon Map 6.3 Controlling of Logistic Map 7 Conclusions
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT880591001
http://hdl.handle.net/11536/66230
Appears in Collections:Thesis