Week 8: Simple bandit#
What you see#
The example shows the simple bandit algorithm applies to to a 10-armed problem with binary rewards. In step \(k\), when you select an arm \(a_k\), the environment gives you a reward of \(r_k=1\) with probability \(q_k^*\) and otherwise a reward of 0.
The height of the bars show the current average reward for each arm, and the numbers how often they have been tried. You can show the actual average rewards \(q_k^*\) by pressing q (although this is cheating! if we knew these values there would be no point in running a bandit algorithm!).
How it works#
todo.