Friday, June 22, 2012

Balancing Estimated Response Propensities

One objective for field data collection other than achieving the highest response rate possible, might be to achieve the most balanced response possible (possibly with some minimum response rate). One issue with this is that we are estimating the response propensities in a dynamic setting. The estimated propensities surely have sampling error, but they also vary as the data used to estimate them change. This could lead to some bad decisions.

For instance, if we target some cases one day, perhaps the next day their estimated propensities have changed and we would make a different decision about cases to target. This may be just a loss of efficiency. In a worst case, I suppose it could lead to actually increasing variation in response propensities.

Monday, June 18, 2012

Missing Data and Response Rates

I'm getting ready to teach a seminar on the calculation of response rates. Although I don't work on telephone surveys much anymore (and maybe fewer and fewer other people do), I am still intrigued by the problem of calculating response rates in RDD surveys.

The estimation of "e" is a nice example of a problem where we can say with near certainty that the cases with unknown eligibility are not missing at random. This should be a nice little problem for folks working with methods for nonignorable nonresponse. How should we estimate "e" when we know that the cases for which eligibility is unobserved are systematically different from those for which it is observed? The only thing that could make this a more attractive toy problem would be if we knew the truth for each case.

Probably this problem seems less important than it did a few years ago. But we still need estimates of "e" for other kinds of surveys (even if they play a less important role in other surveys).

Monday, June 11, 2012

Is there value in balancing response?

A few posts ago, I talked about the value of balancing response across subgroups defined by data on the frame (or paradata from all cases). The idea was that this provides some empirical confirmation of whether the subgroups are related to the variables of interest.

Paradoxically, if we balance the response rates across these subgroups, then we reduce the utility of these variable for adjustment later. That's the downside of this practice.

As I said earlier, I think this does provide confirmation of our hypothesis. It also reduces our reliance on the adjustment model, although we have to assume the model is correct and there aren't unobserved covariates that are actually driving the response process.

Is there an additional advantage to this approach? It seems that we are least trying to provide an ordered means of prioritizing the sample. We can describe it. Even if there are departures, we can say something about how the decisions were made to prioritize certain cases. Without this approach, we can only surmise. We often assume that the process is random, but this is probably not accurate.

Followers