Header Ads

Fermat's Library | DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker annotated/explained version.

exceedsexperthumanplayersinmanygames,e.g.,backgammon(1),checkers(2),chess(3),

Jeopardy!(4),Atarivideogames(5),andgo(6).Thesesuccessesallinvolvegameswith

informationsymmetry,whereallplayershaveidenticalinformationaboutthecurrentstateof

thegame.This propertyofperfect informationisalso attheheart ofthealgorithms thatenabled

these successes, e.g., local search during play (7,8).

Thefounderofmoderngametheoryandcomputingpioneer,vonNeumann,envisioned

reasoning in games without perfect information.“Real life is not like that.Real life consists of

bluffing,oflittletacticsofdeception,ofaskingyourselfwhatistheothermangoingtothink

Imeantodo.Andthatiswhatgamesareaboutinmytheory.”(9)Onegamethatfascinated

vonNeumannwaspoker,whereplayersaredealtprivatecardsandtaketurnsmakingbetsor

bluffing on holdingthe strongest hand, calling opponents’ bets, or folding and giving up onthe

handandthebetsalreadyaddedtothepot.Pokerisagameofimperfectinformation,where

players’ private cards give them asymmetric information about the state of game.

Heads-upno-limitTexashold’em(HUNL)isatwo-playerversionofpokerinwhichtwo

cardsareinitiallydealtface-downtoeachplayer,andadditionalcardsaredealtface-upin

threesubsequentrounds.Nolimitisplacedon thesizeofthebetsalthoughthereisan overall

limittothetotalamountwageredineachgame(10).AItechniqueshavepreviouslyshown

success inthe simpler gameof heads-uplimit Texas hold’em, whereall betsare of afixed size

resultinginjustunder10

14

decisionpoints(11).Bycomparison,computershaveexceeded

experthumanperformanceingo(6),aperfectinformationgamewithapproximately10

170

decision points (12).The imperfect information game HUNLis comparable insize to go,with

the number of decision points exceeding 10

160

(13).

Imperfectinformationgamesrequiremorecomplexreasoningthansimilarlysizedperfect

informationgames.Thecorrectdecisionataparticularmomentdependsupontheprobability

distributionoverprivateinformationthattheopponentholds,whichisrevealedthroughtheir

pastactions.However,howouropponent’sactionsrevealthat informationdependsupontheir

knowledgeofourprivateinformationandhowouractionsrevealit.Thiskindofrecursive

reasoningiswhyonecannoteasilyreasonaboutgamesituationsinisolation,whichisatthe

heartofheuristicsearchmethodsforperfectinformationgames.CompetitiveAIapproaches

in imperfect information games typically reason about the entiregame and produce a complete

strategypriortoplay(14–16).Counterfactualregretminimization(CFR)(14,17,18)isonesuch

techniquethatusesself-playtodorecursivereasoningthroughadaptingitsstrategyagainstitself

oversuccessiveiterations.Ifthe gameistoolarge tobesolved directly,thecommonresponse

istosolve asmaller,abstractedgame.Toplaytheoriginalgame,onetranslatessituationsand

actions from the original game to the abstract game.

AlthoughthisapproachmakesitfeasibleforprogramstoreasoninagamelikeHUNL,it

doessobysqueezingHUNL’s10

160

situationsdowntotheorderof10

14

abstractsituations.

Likelyasaresultofthislossofinformation,suchprogramsarebehindexperthumanplay.In

2015, thecomputer programClaudico lost toa teamof professional pokerplayers by amargin

of91mbb/g(19),whichisa“hugemarginofvictory”(20).Furthermore,ithasbeenrecently

shownthatabstraction-basedprogramsfromtheAnnualComputerPokerCompetitionhave

2

Let's block ads! (Why?)



from Poker

No comments

Powered by Blogger.