Friday, November 18, 2016

The Cost of a Call Attempt

We recently did an experiment with incentives on a face-to-face survey. As one aspect of the evaluation of the experiment, we looked at the costs associated with each treatment (i.e. different incentive amounts).

The costs are a bit complicated to parse out. The incentive amount is easy, but the interviewer time is hard. Interviewers record their time for at the day level, not at the housing unit level. So it's difficult to determine how much a call attempt costs.

Even if we had accurate data on the time spent making the call attempt, there would still be all the travel time from the interviewer's home to the area segment. If I could accurately calculate that, how would I spread it across the cost of call attempts? This might not matter if all I'm interested in is calculating the marginal cost of adding an attempt to a visit to an area segment. But if I want to evaluate a treatment -- like the incentive experiment -- I need to account for all the interviewer costs, as best as I can.

A simple approach is to just divide the interviewer hours by the total number of call attempts. This gives an average that might be useful for some purposes. Or I can try to account for differences in lengths of different types of call attempt outcomes. If the distribution of types of outcomes differ across treatments, then the average length of any attempt might not be a fair comparison of the costs of the two treatments.

I suspect that the problem can only be "solved" by defining the specific purpose for the estimate. Then thinking about how errors in the estimate might impact the decision. In other words, how bad does the estimate have to be to lead you to the wrong decision? I think there are a number of interesting cost problems like this, where we haven't measured the costs directly, but need to use some proxy measure that might have errors of different kinds.

Friday, November 11, 2016

Methodology on the Margins

I'm thinking again about experiments that we run. Yes, they are usually messy. In my last post, I talked about the inherent messiness of survey experiments that is due to the fact that surveys have many design features to consider. And these features may interact in ways that mean we can't simply pull out an experiment on a single feature and generalize the result to other surveys.

But I started thinking about other problems we have with experiments. I think another big issue is that methodological experiments are often run as "add-ons" to larger surveys. It's hard to obtain funding to run a survey just to do a methodological experiment. So, we add our experiments to existing surveys.

The problem is that this approach usually creates a limitation. The experiments can't risk creating a problem for the survey. In other words, they can't lead to reductions in response rates or threaten other targets that are associated with the main (i.e. non-methodological) objective of the survey. The result is that the experiments are often contained to things that can only have small effects.

A possible exception is when a large, ongoing survey undertakes a re-design. The problem is that this only happens for large surveys, and the research is still formed by the objectives of that particular survey. I'd like to see this happen more generally. It would be nice to have some surveys that have a methodological focus that could provide results that generalize to a population of smaller-scale surveys. Such a survey could also have a secondary substantive goal.

Friday, November 4, 2016

Messy Experiments

I have this feeling that survey experiments are often very messy. Maybe it's just in comparison to the ideal type -- a laboratory with a completely controlled environment where only one variable is altered between two randomly assigned groups.

But still, surveys have a very complicated structure. We often call this the "essential survey conditions." But that glib phrase might hide some important details. My concern is that when we focus on a single feature of a survey design, e.g. incentives, we might come to the wrong conclusion if we don't consider how that feature interacts with other design features.

This matters when we attempt to generalize from published research to another situation. If we only focus on a single feature, we might come to the wrong conclusion. Take the well-known result -- incentives work! Except that the impact of incentives seems to be different for interviewer-administered surveys than for self-administered surveys. The other features of the design are also important and may mediate the expected results of the feature under consideration.

Every time I start to write a literature review, this issue comes up in my mind as I try to reconcile the inevitably conflicting results. Of course, there are other problems, such as the normal noise associated with published research results. But, there is this other potential reason out there that should be kept in mind.

The other side of the issues comes up when I'm writing up the methods used. Then I have to remind myself to be as detailed as possible in describing the survey design features so that the context of the results will be clear.