標題: 啟發式多臂老虎機之最佳解辨識
Best Arm Identification for Heuristic-Based Multi-Armed Bandits
作者: 葉騉豪
吳毅成
Yeh, Kun-Hao
資訊科學與工程研究所
關鍵字: 多臂老虎機;啟發;heuristic;multi-armed bandits
公開日期: 2016
摘要: 這篇論文提出多臂老虎機的一種變形問題,名為啟發式多臂老虎機。本篇論文假設啟發之存在且能用於幫助最佳解之辨識。我們針對此問題提出了一個推薦最佳解之模型。根據此模型,我們更提出動態的資源投資演算法並延伸成採取最小最大值更新之樹狀搜尋演算法。此二演算法皆無需任何參數且能在任意時間停止並推薦最佳解。我們將此二演算法與現有演算法在本篇提出的問題以及隨機生成之遊戲樹上做實驗。結果顯示本篇提出之方法在有啟發評估的幫助下,辨識出最佳解的表現比現有的演算法更好。
This paper first presents a variant of the Multi-Armed Bandit (MAB) problem, called heuristic-based MAB (H-MAB) problem, where heuristics are available to help identify the best arm. We then propose a recommendation model for H-MAB. Based on H-MAB and the model, we propose a dynamic budget allocation algorithm for best arm identification and extend it to a tree search algorithm with minimax update. Both algorithms are anytime and parameter free. We corroborate the proposed model and algorithms with experiments on both H-MABs and random game trees, and demonstrate that these algorithms outperform state-of-the-art algorithms in the aspect of the probability of identifying the best arms.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070356038
http://hdl.handle.net/11536/139410
Appears in Collections:Thesis