蒙地卡羅樹搜尋與深度卷積類神經網路之一般化結合

標題:	蒙地卡羅樹搜尋與深度卷積類神經網路之一般化結合 A General Approach to Combining MCTS with DCNN
作者:	藍立呈吳毅成陳榮傑 Lan, Li-Cheng Wu, I-Chen Chen, Rong-Jaye 網路工程研究所
關鍵字:	蒙地卡羅樹搜尋;類神經網路;深度卷積類神經網路;MCTS;DCNN;AlphaGo;APV-MCTS;GAPV-MCTS
公開日期:	2017
摘要:	DeepMind 為AlphaGo提出的一個搜尋演算法稱作APV-MCTS，它能非同步地結合Monte Carlo Tree Search (MCTS) 和Deep Convolutional Neural Networks (DCNN)。AlphaGo透過此演算法結合他們訓練的DCNN成為第一支成功擊敗圍棋人類職業棋士的圍棋AI程式。本篇主要是透過探討APV-MCTS的特性，並將其改成一個更一般化的演算法稱作GAPV-MCTS，以適用於更多不同的遊戲。我們以NoGo (一個圍棋的變種遊戲) 做為我們主要的實驗對象。在經過調整GAPV-MCTS裡的參數後，GAPV-MCTS在用同一組DCNN的情況下，相較於APV-MCTS可以多進步約220 ELO (勝率77%)。 Asynchronous Policy and Value MCTS Algorithm (APV-MCTS) proposed by DeepMind is a searching algorithm used in AlphaGo that combines Monte Carlo Tree Search (MCTS) with Deep Convolutional Neural Networks (DCNN) asynchronously. With APV-MCTS and DCNN, AlphaGo successfully became the first Go AI program that defeated professional human Go players. In this thesis, we will discuss some issues of APV-MCTS, and propose General APV-MCTS (GAPV-MCTS), which is modified from APV-MCTS to improve AI programs of other games. We apply GAPV-MCTS to NoGo (a variation of Go). After tuning some parameters in GAPV-MCTS, it performs 220 ELO (77% winning rate) higher than APV-MCTS using the same DCNNs.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070456513 http://hdl.handle.net/11536/142888
顯示於類別：	畢業論文