Here is another paper that looks promising: http://citeseerx.ist.psu.edu/viewdoc...

Here is another paper that looks promising: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.109....

From the abstract: The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials.

The problem is: should he continue to pull the arm that has given the best payout until now, or try another arm he doesn't yet know much about to see if it has a higher payout?