I'm still struggling to find a method that improves contact rates in the refusal conversion process for the experiment with call scheduling. As a reminder, the experimental method improves contact rates for calls prior to a refusal, but then contact rates for calls after a first refusal have lower contact rates than calls to those cases in the control group. Ouch.
I already tried calling households at times other than the time at which the first refusal was taken. The hypothesis was that people were screening us out and that calling at a different time might lead to someone else in the household picking up the phone. But that didn't work.
In looking at the data, searching for a reason that this is happening, I noticed that the control group seemed to be "exploring" better than the experimental group. The figures below demonstrate this. The upper figure shows calls prior to a refusal. It shows the average number of windows that have been called by call number for calls prior to a refusal. The green line is the experimental method. It looks like it moves ahead of the control group early on (where most of the calls are).
A central problem in reinforcement learning is the tension between exploration and exploitation. Exploration strategies try more of the available actions to learn what the reward might be for actions which have been little tried. Exploitation strategies pick the strategy that has the greatest reward among those actions for which the reward is known. It might be that in the process of refusal conversions, we don't really know very much (i.e. the model predictions are very imprecise) and we should explore more of the available actions. I've made a change to the algorithm to see if it can increase exploration in the experimental method.