運用於圍棋之深度強化式學習設計

標題:	運用於圍棋之深度強化式學習設計 Design of Deep Reinforcement Learning for Playing Go
作者:	吳宏君吳毅成 Wu, Hung-Chun Wu, I-Chen 資訊科學與工程研究所
關鍵字:	深度學習;深度強化式學習;圍棋;deep learning;deep reinforcement learning;convolutional neural network;Go;AI
公開日期:	2017
摘要:	人工智慧過去在圍棋程式的領域發展多年，仍難以突破頂尖人類棋士的水準. 近年來隨著深度學習的發展, 機器對於圖形的pattern辨識與分類有突破性的提升. 同時也讓圍棋程式成功達到職業的強度. AlphaGo結合深度學習模型及強化式學習的概念, 讓圍棋AI學習判斷每一個盤面的價值, 克服了以往圍棋因為複雜度過高, 難以設計推算價值模型的問題, 大幅地提升程式強度. 在本論文中, 我們設計了一套強化式深度學習的架構, 配合策略梯度(policy gradient)的更新方式, 訓練以下棋強度為取向的策略網路(policy network). 我們也透過增加模型的深度及寬度, 並結合剩餘網路(residual network), 達到了更高的強度. Artificial Intelligence in Go had been developing for several years, yet it could hardly compete with professional players in the past. Recently, with the breakthrough of deep learning, the ability of machine for classification and pattern recognition has been significantly improved, which makes the Go AI program competitive with the professional player. AlphaGo [Huang et al. 2016] combines the concept of deep learning and reinforcement learning in order to teach the program to evaluate game positions. This dramatically boosts the strength of the program. In this paper, we propose a deep reinforcement learning (DRL) framework where the policy network is trained based on policy gradient. We also apply residual network in order to make the model deeper. The result shows that the network can beat Pachi, a MCTS-based program, with 92.20% win-rate, the highest win-rate that have ever reported.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070356169 http://hdl.handle.net/11536/142814
顯示於類別：	畢業論文