Monday, February 29, 2016

What is a "response propensity"?

We talk a lot about response propensities. I'm starting to think we actually create a lot of confusion for ourselves by the way we sometimes have these discussions. First, there is a distinction between an actual and an estimated propensity. This distinction is important as our models are almost always misspecified. It is probably the case that important predictors are never observed -- for example, the mental state of the sampled person at the moment that we happen to contact them. So that the estimated propensity and true propensity are different things.

The model selection choices we make can, therefore, have something of an arbitrary flavor to them. I think the choices we make should depend on the purpose of the model. We examined in a recent paper on nonresponse weighting whether call record information, especially the number of calls and refusal indicators, were useful predictors of response propensities for this purpose. It turns out that these variables were strong predictors of response, but just added noise to the weights since they were unrelated to many of the survey variables. I think this reflects that the fact that the survey process is noisy -- lots of variation in recruitment strategies (e.g timing of calls varies across cases, interviewers vary), unobserved mental states of sampled persons, and possibly measurement error in the paradata.

Once considering the purpose, we might think of model selection very differently. I think this is true for adaptive designs that base design features upon these estimated response propensities. Here, I think it makes sense to identify predictors in these models that are also related to the survey outcome variables. Like the post-survey adjustment example, I think this gives us the best chance to control potential nonresponse biases.

Back to the original problem I raised, I think discussion of generic response propensities might lead us astray from this goal. It can be easy to forget that their are important modeling choices and the way we make those choices will impact our potential results.

Monday, February 15, 2016

Survey Data and Missing Data in Big Data

I found this interesting article about using survey data to uncover missing data in big data. The big data are electronic medical records, which are cheap to analyze, but have important gaps. These folks used a survey to assess the gaps.

Saturday, February 6, 2016

Survey Data and Big Data

I had an opportunity to revisit an article by Burns and colleagues that looks at using data from smartphones (they have a nice appendix of all the data they can get from each phone) to predict things that might trigger episodes of depression. Of course, the data don't contain any specific measures of depression. In order to get those, the researchers had to.... surveys. Once they had those, then they could find the associations with the censor data from the phone. Then they could deliver interventions through the phone.

There are 38 sensors on the phone. The phone delivers data quite frequently. So even a small number of phones (n=8 in this trial) there was quite a large amount of data generated. A bigger trial would have even more data. So this seems like a big data application.

And, in this case the "organic" data from the phone need some "designed" (i.e. survey) data in order to be useful.

This is also interesting in that the smartphone is delivering an intervention -- not just a survey. I've seen other applications that use smartphones to provide health- or mental health-related interventions. It might be that survey methodologists have a role to play in helping to design these kinds of studies.