Survey Methods Musings

Posts

Showing posts from March, 2014

Defining phases

I have been working on a presentation on two-phase sampling. I went back to an old example from an RDD CATI survey we did several years ago. In that survey, we defined phase 1 using effort level. The first 8 calls were phase 1. A subsample of cases was selected to receive 9+ calls. It was nice in that it was easy to define the phase boundary. And that meant that it was easy to program. But, the efficiency of the phased approach relied upon their being differences in costs across the phases. Which, in this case, means that we assume that cases in phase two require similar levels of effort to be completed. This is like assuming a propensity model with calls as the only predictor. Of course, we usually have more data than that. We probably could create more homogeneity in phase 2 by using additional information to estimate response probabilities. I saw Andy Peytchev give a presentation where they implemented this idea. Even just the paradata would help. As an example, consider two

Monitoring Daily Response Propensities

I've been working on this paper for a while. It compares models estimated in the middle of data collection with those estimated at the end of data collection. It points out that these daily models may be vulnerable to biased estimates akin to the "early vs. late" dichotomy that is sometimes used to evaluate the risk of nonresponse bias.The solution is finding the right prior specification in a Bayesian setup or using the right kind and amount of data from a prior survey so that estimates will have sufficient "late" responders. But, I did manage to manufacture this figure which shows the estimates from the model fit each day with the data available that day ("Daily") and the model fit at the end of data collection ("Final"). The daily model is overly optimistic early. For this survey, there were 1,477 interviews. The daily model predicted there would be 1,683. The final model predicted 1,477. That's the average "optimism."

What would a randomized call timing experiment look like?

It's one thing to compare different call scheduling algorithms. You can compare two algorithms and measure the performance using whatever metrics you want to compare (efficiency, response rate, survey outcome variables). But what about comparing estimated contact propensities? There is an assumption often employed that these calls are randomly placed. This assumption allows us to predict what would happen under a diverse set of strategies -- e.g. placing calls at different times. Still, this had me wondering what a really randomized experiment would look like. The experiment would be best randomized sequentially as this can result in more efficient allocation. We'd then want to randomize each "important" aspect of the next treatment. This is where it gets messy. Here are two of these features: 1. Timing. The question is, how to define this. We can define it using "call windows." But even the creation of these windows requires assumptions... and tradeo

More methods research for the sake of methods...

In my last post, I suggested that it might be nice to try multiple survey requests on the same person. It reminded me of a paper I read a few years back on response propensity models that suggested continuing calling after the interview is complete, just so that you can estimate the model. At the time, I thought it was sort of humorous to suggest that. Now I'm drawing closer to that position. Not for every survey, but it would be interesting to try. In addition to validating estimated propensities at the person level, this might be another way to assess predictors of nonresponse that we can't normally assess. Peter Lugtig has an interesting paper and blog post about assessing the impact of personality traits on panel attrition. He suggests that nonresponse to a one-time, cross-sectional survey might have a different relationship to personality traits. Such a model could be estimated for a cross-sectional survey of employees who all have taken a personality test. You could do