Skip to main content


Predictions of Nonresponse Bias

One issue that we have been discussing is indicators for the risk of nonresponse bias. There are some indicators that use observed information (i.e. largely sampling frame data) to determine whether respondents and nonrespondents are similar. The R-Indicator is an example of this type of indicator. It's not the only one. There are several sample balance indicators. There is an implicit model that the observed characteristics are related to the survey data and controlling for them will, therefore, also control the potential for nonresponse bias.

Another indicator uses the observed data, including the observed survey data, and a model to fill in the missing survey data. The goal here is to predict whether nonresponse bias is likely to occur. Here, the model is explicit.

An issue that impacts either of these approaches is that if you are able to predict the survey variables with the sampling frame data, then why bother addressing imbalances on them during data collection? One answer …
Recent posts

Data Quality Specialists

I have been talking to undergraduates about survey methodology. The students I talk to have learned either some social research methods or statistics. I think that many are interested in data science and/or big data.

From these conversations, I found it was useful to describe survey methodologists as "data quality specialists." Survey methodology is not a field that most undergraduates are even aware of. But when I started talking about how we evaluate the quality of data, I could see ears perking up. It reinforced for me the idea that the Total Survey Error perspective is valuable for Big Data.We can talk about nonresponse and measurement error in a coherent way.

Raising questions about the quality of the data, the need to understand the processes that generated those data, and methods for evaluation of the data were all ideas that seemed to resonate with undergraduates... well, at least some. It was energizing and exciting to speak with them. Hopefully they bring that ener…

Surveys and Other Sources of Data

Linking surveys and other sources of data is not a new idea. This has been around for a long time. It's useful in many situations. For example, when respondents would have a difficult time supplying the information (for example, exact income information).

Much of the previous research on linkage has focused on either the ability to link data, possibly in a probabilistic fashion; or there have been examinations of biases associated with the willingness to consent to linkage.

It seems that new questions are emerging with the pervasiveness of data generated by devices, especially smart phones. I read an interesting article by Melanie Revilla and colleagues about trying to collect data from a tracking application that people install on their devices. They examine how the "meter" as they call the application might be incompletely covering the sample. For example, persons might have multiple devices and only install it on some of them. Or, persons might share devices and not i…

Survey Modes and Recruitment

I've been struggling with the concept of "mode preference." It's a term we use to describe the idea that respondents might have preferences for a mode and that if we can identify or predict those preferences, then we can design a better survey (i.e. by giving people their preferred mode).

In practice, I worry that people don't actually prefer modes. If you ask people what mode they might prefer, they usually say the mode in which the question is asked. In other settings, the response to that sort of question is only weakly predictive of actual behavior.

I'm not sure the distinction between stated and revealed preferences is going to advance the discussion much either. The problem is that the language builds in an assumption that people actually have a preference. Most people don't think about survey modes. Most don't consider modes abstractly in the way methodologists might. In fact, these choices are likely probabilistic functions that hinge on the…

Response Rates and Responsive Design

A recent article by Brick and Tourangeau re-examines the data from a paper by Groves and Peytcheva (2008). The original analyses from Groves and Peytcheva were based upon 959 estimates with known variables measured on 59 surveys with varying response rates. They found very little correlation between the response rate and the bias on those 959 estimates.

Brick and Tourangeau view the problem as a multi-level problem of 59 clusters (i.e. surveys) of the 959 estimates. They created for each survey a composite score based on all the bias estimates from each survey. Their results were somewhat sensitive to how the composite score was created. They do present several different ways of doing this -- simple mean, mean weighted by sample size, mean weighted by the number of estimates. Each of these study-level composite bias scores is more correlated with the response rate. They conclude: "This strongly suggests that nonresponse bias is partly a function of study-level characteristics; th…

Mechanisms of Mode Choice

Following up yet again, on posts about how people choose modes. In particular, it does seem that different subgroups are likely to respond to different modes at different rates. Of course, with the caveat that it's obviously not just the mode, but also how you get there that matters.

We do have some evidence about subgroups that are likely to choose a mode. Haan, Ongena, and Aarts examine an experiment where respondents to a survey are given a choice of modes. They found that full-time workers and young adults were more likely to choose web over face-to-face.

The situation is an experimental one that might not be very similar to many surveys: Face-to-face and telephone recruitment to the choice of face-to-face or web survey. But at least the design allows them to look at who might make different choices.

It would be good to have more data on persons making the choice in order to better understand the choice. For example, information about how much they use the internet might be us…

The dose matters too...

Just a follow-up from my previous post on mixed-mode surveys. I think that one of the things that gets overlooked in discussions of mixed-mode designs is the dosage of each mode that is applied. For example, how many contact attempts under each mode? It's pretty clear that this matters. In general, more effort leads to higher response rates and less effort leads to lower response rates.

But, it seems that sometimes when we talk about mixed-mode studies, we forget about the dose. We wrote about this idea in Chapter 4 of our new book on adaptive survey design. I think it would be useful to keep this in mind when describing mixed-mode studies. It might be these other features, i.e. not the mode itself, that account for differences between mixed-mode studies. At least in part.