標題: 蒙地卡羅樹搜尋與深度卷積類神經網路之一般化結合
A General Approach to Combining MCTS with DCNN
作者: 藍立呈
吳毅成
陳榮傑
Lan, Li-Cheng
Wu, I-Chen
Chen, Rong-Jaye
網路工程研究所
關鍵字: 蒙地卡羅樹搜尋;類神經網路;深度卷積類神經網路;MCTS;DCNN;AlphaGo;APV-MCTS;GAPV-MCTS
公開日期: 2017
摘要: DeepMind 為AlphaGo提出的一個搜尋演算法稱作APV-MCTS,它能非同步地結合Monte Carlo Tree Search (MCTS) 和Deep Convolutional Neural Networks (DCNN)。AlphaGo透過此演算法結合他們訓練的DCNN成為第一支成功擊敗圍棋人類職業棋士的圍棋AI程式。本篇主要是透過探討APV-MCTS的特性,並將其改成一個更一般化的演算法稱作GAPV-MCTS,以適用於更多不同的遊戲。我們以NoGo (一個圍棋的變種遊戲) 做為我們主要的實驗對象。在經過調整GAPV-MCTS裡的參數後,GAPV-MCTS在用同一組DCNN的情況下,相較於APV-MCTS可以多進步約220 ELO (勝率77%)。
Asynchronous Policy and Value MCTS Algorithm (APV-MCTS) proposed by DeepMind is a searching algorithm used in AlphaGo that combines Monte Carlo Tree Search (MCTS) with Deep Convolutional Neural Networks (DCNN) asynchronously. With APV-MCTS and DCNN, AlphaGo successfully became the first Go AI program that defeated professional human Go players. In this thesis, we will discuss some issues of APV-MCTS, and propose General APV-MCTS (GAPV-MCTS), which is modified from APV-MCTS to improve AI programs of other games. We apply GAPV-MCTS to NoGo (a variation of Go). After tuning some parameters in GAPV-MCTS, it performs 220 ELO (77% winning rate) higher than APV-MCTS using the same DCNNs.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070456513
http://hdl.handle.net/11536/142888
Appears in Collections:Thesis