Ai outscores pros at six-player Texas hold’em poker
US scientists and a research team from Facebook have developed Pluribus, an artificial intelligence that outperforms professional players at six-player no-limit Texas hold’em poker.
Scientists from Carnegie Mellon University in Pittsburgh and Facebook AI Research evaluated Pluribus in two test scenarios. In one scenario, a single version of the artificial intelligence played against five players, in the other a single player against five agents, who played individually. It involved professional players who each won more than a million dollars from the game. A total of 10,000 hands were played over 12 days during the test.
Performance was measured in milli big blinds per game, or mbb/game. This measures how many big blinds are won on average per thousand rounds. A big blind is the money the second player puts into the pot after the first player’s small blind. In the variant where Pluribus played poker against five players, he achieved an average of 48mbb/game, which the researchers say is a very high score, especially when playing against professional players.
Pluribus’ achievements in the ‘5 humans + 1 AI’ experiment
Performance was also consistently high throughout the 10,000 hands played, suggesting that the human opponents struggled to spot weaknesses in the AI’s strategy. For the scenario with five bots against a single professional player, Pluribus won in the ten thousand hands with an average of 32mbb/game.
In Texas Hold’em, players choose five cards from two cards in the hand and five open cards to make good combinations. Poker has been a research field for artificial intelligence for years because of the element of hidden information in developing strategies. Until now, progress has been mostly limited to the two-player poker variant.
The researchers trained Pluribus using the Monte Carlo counterfactual regret minimization algorithm. Cfr is an iterative algorithm that learns by playing random matches and gradually gets better by beating previous versions of itself. The Monte Carlo variant therefore analyzes samples of actions in the course of the game instead of going through the entire ‘game tree’ with each iteration. The algorithm can simulate what would have happened if other actions had been chosen and adjust the strategy based on the degree of ‘regret’ it has that that route had not been taken.
The blueprint of the strategy on which Pluribus works was calculated in eight days on a server with 64 cores and with less than 512GB of ram. The researchers could have developed a more elaborate base strategy for higher performance, but they are aiming to allow a compressed form of the blueprint to run on a system with up to 128GB of RAM during gameplay.
The research team tells Technology Review that it will not release Pluribus because the AI can be misused to defraud online poker services. The technique can be used for AI research in other multiplayer games, but in the long run it can also be applied in practice, for example to improve autonomously driving cars and for defense purposes. The researchers publish their work under the title Superhuman AI for multiplayer poker in Science.