Fermat's Library | DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker annotated/explained version.
exceedsexperthumanplayersinmanygames,e.g.,backgammon(1),checkers(2),chess(3),
Jeopardy!(4),Atarivideogames(5),andgo(6).Thesesuccessesallinvolvegameswith
informationsymmetry,whereallplayershaveidenticalinformationaboutthecurrentstateof
thegame.This propertyofperfect informationisalso attheheart ofthealgorithms thatenabled
these successes, e.g., local search during play (7,8).
Thefounderofmoderngametheoryandcomputingpioneer,vonNeumann,envisioned
reasoning in games without perfect information.“Real life is not like that.Real life consists of
bluffing,oflittletacticsofdeception,ofaskingyourselfwhatistheothermangoingtothink
Imeantodo.Andthatiswhatgamesareaboutinmytheory.”(9)Onegamethatfascinated
vonNeumannwaspoker,whereplayersaredealtprivatecardsandtaketurnsmakingbetsor
bluffing on holdingthe strongest hand, calling opponents’ bets, or folding and giving up onthe
handandthebetsalreadyaddedtothepot.Pokerisagameofimperfectinformation,where
players’ private cards give them asymmetric information about the state of game.
Heads-upno-limitTexashold’em(HUNL)isatwo-playerversionofpokerinwhichtwo
cardsareinitiallydealtface-downtoeachplayer,andadditionalcardsaredealtface-upin
threesubsequentrounds.Nolimitisplacedon thesizeofthebetsalthoughthereisan overall
limittothetotalamountwageredineachgame(10).AItechniqueshavepreviouslyshown
success inthe simpler gameof heads-uplimit Texas hold’em, whereall betsare of afixed size
resultinginjustunder10
14
decisionpoints(11).Bycomparison,computershaveexceeded
experthumanperformanceingo(6),aperfectinformationgamewithapproximately10
170
decision points (12).The imperfect information game HUNLis comparable insize to go,with
the number of decision points exceeding 10
160
(13).
Imperfectinformationgamesrequiremorecomplexreasoningthansimilarlysizedperfect
informationgames.Thecorrectdecisionataparticularmomentdependsupontheprobability
distributionoverprivateinformationthattheopponentholds,whichisrevealedthroughtheir
pastactions.However,howouropponent’sactionsrevealthat informationdependsupontheir
knowledgeofourprivateinformationandhowouractionsrevealit.Thiskindofrecursive
reasoningiswhyonecannoteasilyreasonaboutgamesituationsinisolation,whichisatthe
heartofheuristicsearchmethodsforperfectinformationgames.CompetitiveAIapproaches
in imperfect information games typically reason about the entiregame and produce a complete
strategypriortoplay(14–16).Counterfactualregretminimization(CFR)(14,17,18)isonesuch
techniquethatusesself-playtodorecursivereasoningthroughadaptingitsstrategyagainstitself
oversuccessiveiterations.Ifthe gameistoolarge tobesolved directly,thecommonresponse
istosolve asmaller,abstractedgame.Toplaytheoriginalgame,onetranslatessituationsand
actions from the original game to the abstract game.
AlthoughthisapproachmakesitfeasibleforprogramstoreasoninagamelikeHUNL,it
doessobysqueezingHUNL’s10
160
situationsdowntotheorderof10
14
abstractsituations.
Likelyasaresultofthislossofinformation,suchprogramsarebehindexperthumanplay.In
2015, thecomputer programClaudico lost toa teamof professional pokerplayers by amargin
of91mbb/g(19),whichisa“hugemarginofvictory”(20).Furthermore,ithasbeenrecently
shownthatabstraction-basedprogramsfromtheAnnualComputerPokerCompetitionhave
2
from Poker
Post a Comment