Friday, October 26, 2012

Baby and the Bathwater

This post is a follow-up on my last. Since my last post, I came across an interesting article at Survey Practice. I'm really pleased to see this article, since this is a discussion we really need to have. The article, by Koen Beullens and Geert Loosveldt, presents the results of a simulation study on the impact of using different indicators to govern data collection. In other words, the simulate the consequences of maximizing different indicators in data collection. The three indicators are the response rate, the R-Indicator (Schouten, et al., 2009), and the maximal bias (also developed by Schouten et al. 2009). The simulation shows a situation where you would get a different result from maximizing either of the latter two indicators compared to when you maximize the response rate. Maximizing the R-Indicator, for example, led to a slightly lower response rate than the data collection strategy that maximizes the response rate.

This is an interesting simulation. It pretty clearly explores the distortions that can occur when maximizing the response rate is the goal.

However, I don't see it as convincing evidence that we should radically change our data collection procedures. As I mentioned in my last post, I wouldn't want anyone to conclude that lowering response rates is always OK. The problem is certainly more complicated than that.I would contend that we need experimental evidence regarding the impact of using other indicators to guide data collection.

In the first instance, data collection is such a complex activity that it is impossible to describe all the 'essential features' of any design. It's even more difficult to understand the impact of all these choices. Before changing those practices, we should understand the consequences of doing so. We wouldn't want to throw out the baby with the bathwater. In my mind, that requires experimental evidence.

It is also the case that each of the indicators proposed has weaknesses. If the model underlying the R-Indicator is misspecified, this could lead to inefficient or even bias-increasing actions. It would be good to understand when and how this might happen -- and what protections against this we might develop. My view is that this will require a constellation of indicators that tell an underlying story.

The good news is that this would require the work of many survey methodologists.

Friday, October 19, 2012

Do you really believe that?

I had an interesting discussion with someone at a conference recently. We had given a presentation that included some discussion of how response rates are not good predictors of when nonresponse bias might occur. We showed a slide from Groves and Peytcheva.

Afterwards, I was speaking with someone who was not a survey methodologist. She asked me if I really believed that response rates didn't matter. I was a little taken aback. But as we talked some more, it became clear that she was thinking that we were trying to argue for achieving low response rates. I thought it was interesting that the argument could be perceived that way.

To my mind, the argument wasn't about whether we should be trying to lower response rates. It was more about what tools we should be using to diagnose the problem. In the past, the response rate was used as a summary statistic for discussing nonresponse. But the evidence from Groves and Peytcheva calls into question the utility of that single statistic. My conclusion from that is that we need to work harder to really diagnose the risks of nonresponse bias. We need to view a constellation of statistics, developed under a variety of assumptions.

Monday, October 15, 2012

Estimating effort in field surveys

One of the things that I miss about telephone surveys is being able to accurately estimate how much various activities cost or even how long each call takes. Since everyone works on a centralized system on telephone surveys, and everything gets time-stamped, you can calculate how long calls take. It's not 100% accurate -- weird things happen (someone takes a break and it doesn't show up in the data, networks collapse, etc.) but usually you can get pretty accurate estimates.

In the field, the interviewers tell us what they did and when. But they have to estimate how many hours each subactivity (travel, production, administration) takes, and they don't give anything at the call level.

I've been using regression models to estimate how long each call takes in field studies. The idea is pretty simple, regress the hours worked in a week on the counts of the various types of calls made that week. The estimated coefficients are the estimate of the average time each type of call talks. This still isn't perfect, but it gets a conversation started.

In the past, I have used this idea to try and forecast survey outcomes. Now I'm trying to use it to compare two waves of a survey to see if there are differences. I'm looking for other uses of the technique since I think it is kind of neat.

Friday, October 5, 2012

Call Scheduling in Cluster Samples

A couple of years ago, I tried to deliver recommended times to call housing units to interviewers doing face-to-face interviewing in an area probability sample. Interviewers drive to sampled area segments and then visit several housing units while they are there. This is how cost savings are achieved.

The interviewers didn't use the recommendations -- we had experimental evidence to show this. I had thought maybe the recommendations would help them organize their work. In talking with them afterwards, they didn't see the utility since they plan trips to segments, not single housing units.

I decided to try something simpler. To make sure that calls are being made at different times of day, identify segments that have not been visited in all call windows, or have been visited in only one call window. This information might help interviewers schedule trips if they haven't noticed that this situation had occurred in a segment. If this helpful, then maybe this recommendation can be improved with a more elaborate specification.