Skip to main content

Posts

Total Data Quality

In an earlier post, I suggested that survey methodologists are "data quality specialists." Our focus on "total survey error" (TSE) is, in many ways, the central defining concept of our field. This focus on data quality could be an important contribution that survey methodologists make to the emerging field of data science. But in order to make that contribution, we may need to test the fit of the TSE concept on evaluations of non-survey data.

One of the sources of error in surveys that we examine in surveys is "nonresponse." Does this concept apply to other sources of data? Certainly other sources of data having missing data. But nonresponse is a specific mechanism where we sample a unit and then request data, but the unit fails to supply the data.

How does this concept apply to other sources of data? I wouldn't say that Twitter data suffer from "nonresponse" due to the fact that not everyone has a Twitter account or even that not everyone t…
Recent posts

Predictions of Nonresponse Bias

One issue that we have been discussing is indicators for the risk of nonresponse bias. There are some indicators that use observed information (i.e. largely sampling frame data) to determine whether respondents and nonrespondents are similar. The R-Indicator is an example of this type of indicator. It's not the only one. There are several sample balance indicators. There is an implicit model that the observed characteristics are related to the survey data and controlling for them will, therefore, also control the potential for nonresponse bias.

Another indicator uses the observed data, including the observed survey data, and a model to fill in the missing survey data. The goal here is to predict whether nonresponse bias is likely to occur. Here, the model is explicit.

An issue that impacts either of these approaches is that if you are able to predict the survey variables with the sampling frame data, then why bother addressing imbalances on them during data collection? One answer …

Data Quality Specialists

I have been talking to undergraduates about survey methodology. The students I talk to have learned either some social research methods or statistics. I think that many are interested in data science and/or big data.

From these conversations, I found it was useful to describe survey methodologists as "data quality specialists." Survey methodology is not a field that most undergraduates are even aware of. But when I started talking about how we evaluate the quality of data, I could see ears perking up. It reinforced for me the idea that the Total Survey Error perspective is valuable for Big Data.We can talk about nonresponse and measurement error in a coherent way.

Raising questions about the quality of the data, the need to understand the processes that generated those data, and methods for evaluation of the data were all ideas that seemed to resonate with undergraduates... well, at least some. It was energizing and exciting to speak with them. Hopefully they bring that ener…

Surveys and Other Sources of Data

Linking surveys and other sources of data is not a new idea. This has been around for a long time. It's useful in many situations. For example, when respondents would have a difficult time supplying the information (for example, exact income information).

Much of the previous research on linkage has focused on either the ability to link data, possibly in a probabilistic fashion; or there have been examinations of biases associated with the willingness to consent to linkage.

It seems that new questions are emerging with the pervasiveness of data generated by devices, especially smart phones. I read an interesting article by Melanie Revilla and colleagues about trying to collect data from a tracking application that people install on their devices. They examine how the "meter" as they call the application might be incompletely covering the sample. For example, persons might have multiple devices and only install it on some of them. Or, persons might share devices and not i…

Survey Modes and Recruitment

I've been struggling with the concept of "mode preference." It's a term we use to describe the idea that respondents might have preferences for a mode and that if we can identify or predict those preferences, then we can design a better survey (i.e. by giving people their preferred mode).

In practice, I worry that people don't actually prefer modes. If you ask people what mode they might prefer, they usually say the mode in which the question is asked. In other settings, the response to that sort of question is only weakly predictive of actual behavior.

I'm not sure the distinction between stated and revealed preferences is going to advance the discussion much either. The problem is that the language builds in an assumption that people actually have a preference. Most people don't think about survey modes. Most don't consider modes abstractly in the way methodologists might. In fact, these choices are likely probabilistic functions that hinge on the…

Response Rates and Responsive Design

A recent article by Brick and Tourangeau re-examines the data from a paper by Groves and Peytcheva (2008). The original analyses from Groves and Peytcheva were based upon 959 estimates with known variables measured on 59 surveys with varying response rates. They found very little correlation between the response rate and the bias on those 959 estimates.

Brick and Tourangeau view the problem as a multi-level problem of 59 clusters (i.e. surveys) of the 959 estimates. They created for each survey a composite score based on all the bias estimates from each survey. Their results were somewhat sensitive to how the composite score was created. They do present several different ways of doing this -- simple mean, mean weighted by sample size, mean weighted by the number of estimates. Each of these study-level composite bias scores is more correlated with the response rate. They conclude: "This strongly suggests that nonresponse bias is partly a function of study-level characteristics; th…

Mechanisms of Mode Choice

Following up yet again, on posts about how people choose modes. In particular, it does seem that different subgroups are likely to respond to different modes at different rates. Of course, with the caveat that it's obviously not just the mode, but also how you get there that matters.

We do have some evidence about subgroups that are likely to choose a mode. Haan, Ongena, and Aarts examine an experiment where respondents to a survey are given a choice of modes. They found that full-time workers and young adults were more likely to choose web over face-to-face.

The situation is an experimental one that might not be very similar to many surveys: Face-to-face and telephone recruitment to the choice of face-to-face or web survey. But at least the design allows them to look at who might make different choices.

It would be good to have more data on persons making the choice in order to better understand the choice. For example, information about how much they use the internet might be us…