完整後設資料紀錄
DC 欄位語言
dc.contributor.authorChen, Po-Anen_US
dc.contributor.authorLu, Chi-Jenen_US
dc.date.accessioned2019-05-02T00:26:47Z-
dc.date.available2019-05-02T00:26:47Z-
dc.date.issued2015-01-01en_US
dc.identifier.isbn978-1-4503-3413-6en_US
dc.identifier.urihttp://hdl.handle.net/11536/151714-
dc.description.abstractAlmost all convergence results from each player adopting specific "no-regret" learning algorithms such as multiplicative updates or the more general mirror-descent algorithms in repeated games are only known in the more generous information model, in which each player is assumed to have access to the costs of all possible choices, even the unchosen ones, at each time step. This assumption in general may seem too strong, while a more realistic one is captured by the bandit model, in which each player at each time step is restricted to know only the cost of her currently chosen path, but not any of the unchosen ones. Can convergence still be achieved in such a more challenging bandit model? We answer this question positively. While existing bandit algorithms do not seem to work here, we develop a new family of bandit algorithms based on the mirror-descent algorithm with such a guarantee in atomic congestion games.en_US
dc.language.isoen_USen_US
dc.subjectMirror-descent algorithmen_US
dc.subjectNo-regret dynamicsen_US
dc.subjectConvergenceen_US
dc.titlePlaying Congestion Games with Bandit Feedbacksen_US
dc.typeProceedings Paperen_US
dc.identifier.journalPROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15)en_US
dc.citation.spage1721en_US
dc.citation.epage1722en_US
dc.contributor.department交大名義發表zh_TW
dc.contributor.departmentNational Chiao Tung Universityen_US
dc.identifier.wosnumberWOS:000461455000213en_US
dc.citation.woscount0en_US
顯示於類別:會議論文