Friday, April 27, 2012

Quasi-Experiments and Nonresponse

In my last post, I talked about using future data collection as a quasi-experimental validation of hypotheses about nonresponse. I thought I'd follow up on that a bit more.

We often have this controversy when discussing nonresponse bias: if I can adjust for some variable, then why do I need to bother making sure I get good response rates across the range of values that variable can take on? Just adjust for it.

That view relies on some assumptions. We assume that no matter what response rate I end up at, the same model applies. In other words, the missing data only depend on that variable at every response rate I could choose (Missing at Random). The missing data might depend only on that variable for some response rates but not others.

In most situations, we're going to make some assumptions about the missingness for adjustment purposes. We can't test those assumptions. So no one can ever prove you wrong.

I like the idea that we have a hypothesis at an interim point in the data collection. We might make this hypothesis very specific by predicting values for the missing cases. Then we add some addtional interviews and compare our predictions for those cases to the newly observed data. Does this confirm our hypothesis? Do we make new predictions for the remaining cases now that we have some additional data? In this setup, we can at least partially check our assumptions as we go.

Friday, April 20, 2012

Constellation of views

I'm spending time look at patterns in the nonresponse to a large survey we recently completed. I'm looking at the problem from a number of different angles. It is really very useful to be going over the details and looking at the problem from a number of angles. This is reinforcing a couple of things that I've been saying:

1. We need multi-faceted views of the problem to replace reliance on a key statistic (i.e. the response rate).
2. We need to make a leap beyond the data with reasonable assumptions.

Given the uncertainty about the nonresponse bias, multi-faceted views can't give us much more than a better sense of the risks. With reasonable assumptions, we should be OK.

We will be repeating this survey, so this information may help with future waves. We can use it to guide interventions into the data collection strategy. We might even think of this as quasi-experimental validation of our hypotheses about the nonresponse to prior waves.


Friday, April 13, 2012

Signal and Noise in Feedback

Still on this theme of feedback. It would be nice if we got very clear signals from sampled units about why they don't want to do our surveys. However, this isn't usually the case. It seems that our best models still are pretty weakly predictive of when someone will respond.

Part of this could be that we don't have the 'right' data. We could improve our paradata and build better models.

Another part might never be captured. This is the situational part. Sampled persons might not be able to say why the refuse a survey on one day and agree to do another on another day. The decision is highly sensitive to small differences in the environment that we may never be able to capture in our data.

If that is the case, then the signal we can pick up for tailoring purposes is going to be weak. The good news is that it seems like we still haven't hit the limit of our ability to tailor. Onward!

Friday, April 6, 2012

A Twist on Feedback

In my last post, I talked about thinking about data collected between attempts or waves as "feedback" from sampled units. I suggested that maybe the protocol could be tailored to this feedback.

Another to express this is to say that we want to increase everyone's probabilities of response by tailoring to their feedback. Of course, we might also make the problem more complex by "tailoring" the tailoring. That is, we may want to raise the response probabilities of some individuals more than that of other individuals. If so, might we consider a technique that is more likely to succeed in that subset. I'm thinking of this as a decision problem.

For example, assume we can increase response probabilities by 0.1 for all cases with tailoring. But we notice that two different techniques have this same effect.

1) The first technique increases everyone by 0.1.
2) The second  technique increases a particular subgroup (say half the population) by 0.15 and everyone else by 0.

We might prefer the latter if it reduces some other indicator for the risk of nonresponse bias more than the former. The response rate would definitely prefer the former.

Or, we might have two techniques, one has a big variance in the estimated impact for the subgroup and low variance overall and the other has low variance for the subgroup and high variance for everyone else. We might prefer the latter technique if something other than the response rate is our reward function.

Sunday, April 1, 2012

Feedback from Sampled Units

A while ago, I wrote about developing algorithms that determine when to switch modes. I noted that the problem was that in many multiple mode surveys, there is very little feedback. For instance, in mailed and web surveys, the only feedback is a returned letter or email. We also know the outcome -- whether the mode succeeded or failed.

I still think the most promising avenues for this type of switching are from interviewer-administered modes. For instance, can we pick up clues from answering machine messages that would indicate that we should change our policy (mode)?

It may also be that panel studies with multiple modes are a good setting for developing this sort of algorithm. An event observed at one time period, or the response to questions predicting a mode more likely to induce response might be useful "feedback" in such a setting.


Followers