運用於圍棋之深度強化式學習設計

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	吳宏君	zh_TW
dc.contributor.author	吳毅成	zh_TW
dc.contributor.author	Wu, Hung-Chun	en_US
dc.contributor.author	Wu, I-Chen	en_US
dc.date.accessioned	2018-01-24T07:42:42Z	-
dc.date.available	2018-01-24T07:42:42Z	-
dc.date.issued	2017	en_US
dc.identifier.uri	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070356169	en_US
dc.identifier.uri	http://hdl.handle.net/11536/142814	-
dc.description.abstract	人工智慧過去在圍棋程式的領域發展多年，仍難以突破頂尖人類棋士的水準. 近年來隨著深度學習的發展, 機器對於圖形的pattern辨識與分類有突破性的提升. 同時也讓圍棋程式成功達到職業的強度. AlphaGo結合深度學習模型及強化式學習的概念, 讓圍棋AI學習判斷每一個盤面的價值, 克服了以往圍棋因為複雜度過高, 難以設計推算價值模型的問題, 大幅地提升程式強度. 在本論文中, 我們設計了一套強化式深度學習的架構, 配合策略梯度(policy gradient)的更新方式, 訓練以下棋強度為取向的策略網路(policy network). 我們也透過增加模型的深度及寬度, 並結合剩餘網路(residual network), 達到了更高的強度.	zh_TW
dc.description.abstract	Artificial Intelligence in Go had been developing for several years, yet it could hardly compete with professional players in the past. Recently, with the breakthrough of deep learning, the ability of machine for classification and pattern recognition has been significantly improved, which makes the Go AI program competitive with the professional player. AlphaGo [Huang et al. 2016] combines the concept of deep learning and reinforcement learning in order to teach the program to evaluate game positions. This dramatically boosts the strength of the program. In this paper, we propose a deep reinforcement learning (DRL) framework where the policy network is trained based on policy gradient. We also apply residual network in order to make the model deeper. The result shows that the network can beat Pachi, a MCTS-based program, with 92.20% win-rate, the highest win-rate that have ever reported.	en_US
dc.language.iso	en_US	en_US
dc.subject	深度學習	zh_TW
dc.subject	深度強化式學習	zh_TW
dc.subject	圍棋	zh_TW
dc.subject	deep learning	en_US
dc.subject	deep reinforcement learning	en_US
dc.subject	convolutional neural network	en_US
dc.subject	Go	en_US
dc.subject	AI	en_US
dc.title	運用於圍棋之深度強化式學習設計	zh_TW
dc.title	Design of Deep Reinforcement Learning for Playing Go	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
顯示於類別：	畢業論文