Saturday, August 31, 2013

New Objective Functions...

I've argued in previous posts that the response rate has functioned like an objective function that has been used to design "optimal" data collections. The process has been implicitly defined this way. And it is probably the case that the designs are less than optimal for maximizing the response rate. Still, data collection strategies have been shaped by this objective function.

Switching to new functions may be difficult for a number of reasons. First, we need other objective functions. These are difficult to define as there is always uncertainty with respect to nonresponse bias. Which function may be the most useful? R-Indicators? Functions of the relationships between observed Y's and sampling frame data?

There are theoretical considerations, but we also need empirical tests. What happens empirically when data collection has a different goal? We haven't systematically tested these other options and their impact on the quality of the data. That should be high on our "to do" list.

Friday, August 23, 2013

Empirical Data on Survey Costs

I pointed out an interesting (if older) book by Seymour Sudman a few posts ago -- "Reducing Survey Costs" from 1966. There is another book that talks about survey costs -- Groves "Survey Errors and Survey Costs" from 1989.

Groves talks about cost models for a telephone facility. The models are quite detailed. He notes that computerized telephone facilities can quite accurately estimate many of the parameters in the model. He does give a long, detailed table comparing costs for a telephone and face-to-face survey.

Most of the discussion is in Groves' book is of telephone facilities. But the same modeling approach could be taken to face-to-face surveys. The problem is that in that kind of survey, we can't rely on computers to keep track of time that different tasks take. So estimation of the model parameters is going to be more difficult. But, at least conceptually, this would be a useful approach. That would allow us to bring in costs to more facets of the survey design process for face-to-face surveys.

Friday, August 16, 2013

Adaptive Interventions

I was at a very interesting workshop today on adaptive interventions. Most of the folks at the workshop design interventions for chronic conditions and would be used to testing their interventions using a randomized trial.

Much of the discussion was on heterogeneity of treatment effects. In fact, much of their research is based on the premise that individualized treatments should do better than giving everyone the same treatment. Of course, the average treatment might be the best course for everyone, but they have certainly found applications where this is not true. It seems that many more could be found.

I started to think about applications in the survey realm. We do have the concept of tailoring, which began in our field with research into survey introductions. But do we use it much? I have two feelings on this question. No, there aren't many examples like the article I linked to above. We usually test interventions (design features like incentives, letters, etc.) on the whole sample. We may note that they work differentially across subgroups, but we rarely design interventions for specific subgroups.

My other feeling is that, yes, we do some of this. For example, we only apply refusal conversions to cases that have refused. We just need to think about all of the things that we do and maybe 'relabel' them.

The other thought that I had was that it would be difficult for us to design completely individualized treatments like I saw them doing today. We don't get the same kind of detailed feedback that they get. But still, I think we can move toward more differentiated treatment strategies.

Friday, August 9, 2013

Adjusting with Weights... or Adjusting the Data Collection?

I just got back from JSM where I saw some presentations on responsive/adaptive design. The discussant did a great job summarizing the issues. He raised one of the key questions that always seems to come up for these kinds of designs: If you have those data available for all the cases, why bother changing the data collection when you can just use nonresponse adjustments to account for differences along those dimensions?

This is a big question for these methods. I think there are at least two responses (let me know if you have others).

First, in order for those nonresponse adjustments to be effective, and assuming that we will use weighting cell adjustments (the idea extends easily to propensity modeling), the respondents within any cell need to be equivalent to a random sample of the cell. That is, the respondents and nonrespondents need to have the same mean for the survey variable. A question might be, at what point does that assumption become true? Of course, we don't know. But if we alter our data collection strategy, we will at least strive to empirically verify that over some range of response rates.

Second, this is an empirical question. It would be nice to have studies that looked at this question. Does balancing response along specified dimensions lead to reduced nonresponse bias after adjustment? My hunch is "yes, it does."

Friday, August 2, 2013

Survey Costs

I'm reading an interesting book, Seymour Sudman's "Reducing the Cost of Surveys." It was written in 1967, so some of the book is about "high tech" methods like using the telephone and scanning forms.

The part I'm interested in is the interviewer cost models. I'm used to the cost models in sampling texts, which are not very elaborate. Sudman has much more elaborate cost models. For example, the costs of surveys can vary across different types of PSUs and for interviewers who live different distances from their sample clusters.

It brings to mind Groves book on Survey Errors and Survey Costs, only because they are among the few examples that have looked closely at costs.

The problem in my work is that it is often difficult to estimate costs. Things get lumped together. Interviewers estimate how much time various activities take. It seems like we've been really focused on the "errors" part of the equation and assumed that the "costs" part is easy. That assumption is often not true.

Followers