From the abstract: The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials.
The problem is: should he continue to pull the arm that has given the best payout until now, or try another arm he doesn't yet know much about to see if it has a higher payout?
From the abstract: The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials.
The problem is: should he continue to pull the arm that has given the best payout until now, or try another arm he doesn't yet know much about to see if it has a higher payout?