Saturday, February 25, 2012

Response Rates as a Reward Function

I recently saw a presentation by Melanie Calinescu and Barry Schouten on adaptive survey design. They have been using optimization techniques to design mixed-mode surveys. In the optimization problems, they seek to maximize a measure of sample balance (the R-Indicator) for a fixed cost by using different allocation to the modes for different subgroups in the population (for example, <35 years of age and 35+).  The modes in their example are web and face-to-face. In their example, the older group is more responsive in both modes, so they get allocated at higher rates to web. You can read their paper here to see the very interesting setup and results.

In the presentation, they showed what happens when you use the response rate as the thing that you are seeking to maximize. In some of the lower budgets, the optimal allocation was to simply ignore the younger group. You could not get a higher response rate by doing anything other than using all your resources on the older group. Once you had taken all of the relatively easy interviews with the older group, you might try to get some easy interviews with the younger group.

I thought that was an interesting result. It showed that allowing the response rate to guide data collection can be harmful. Fortunately, it seems that no one would actually carry out such a design. Still, it does make me wonder what harmful effects the response rate may have on data collection practices.

Friday, February 17, 2012

Call Record Problems

A couple of years ago I did an experiment where I recommended times to called sampled units in a face-to-face survey based on an area probability cluster sample. The recommendations were based on estimates from multi-level logistic regression models. The interviewers ignored the recommendations.

In meetings with the interviewers, several said that they didn't follow the recommendations since they call every case on every trip to an area segment. The call records certainly didn't reflect that claim. But it got me thinking that maybe the call records don't reflect everything that happens.

Biemer, Chen and Wang (2011) reported a survey of interviewers where the interviewers did report that they do not always create a call record for a call. They reported that sometimes they would not report a call in order to keep a case alive (since the number of calls on any case was limited) or because they just drove by the sampled unit and saw that no one was home. Biemer, Chen, and Wang also show that this selective reporting of calls can damage nonresponse adjustments that use the number of calls. Making bias worse.

It seems like there are two options. 1) Understand the process that generates the call records and how errors occur (this might allow us to adjust for the errors); 2) Improve the process to remove those errors. Either way, it seems like option 1 is the first step.

Friday, February 10, 2012

Are we ready for a new reward function?

I've been thinking about the harmful effects of using the response rate as a data quality indicator. It has been a key -- if not THE key -- indicator of data quality for a while. One of the big unknowns is the extent to which the pervasive use of the response rate as a data quality indicator has malformed the design of surveys. In other words, has the pursuit of high response rates led to undesirable effects?

It is easy to say that we should be more focused on bias, but harder to do. Generally, we don't know the bias due to nonresponse. So if we are going to do something to reduce bias, we need a "proxy" indicator. For example, we could impute values to estimate the bias. This requires that the bias of an unweighted mean be related to things that we observe and that we specify the right model.

No matter which indicator we select, we need some sort of assumptions to motivate this "proxy" indicator. Those assumptions could be wrong. When we are wrong, do we more damage than good? On any particular survey, this could be the case. That is, following the "proxy" indicator actually leads to a worse bias.

At the moment, we need more research to see if we can find indicators that, on average, when used to guide data collection actually reduce nonresponse bias across many surveys.This probably means gold standard studies that are pursued solely for methodological purposes. If we don't do that research, and start tuning our data collection practices to other indicators we may actually run the risk of "throwing the baby out with the bath water" and developing methods that actually increases biases.