Skip to main content

Posts

Showing posts with the label Coverage

Total Data Quality

In an earlier post, I suggested that survey methodologists are "data quality specialists." Our focus on " total survey error " (TSE) is, in many ways, the central defining concept of our field. This focus on data quality could be an important contribution that survey methodologists make to the emerging field of data science. But in order to make that contribution, we may need to test the fit of the TSE concept on evaluations of non-survey data. One of the sources of error in surveys that we examine in surveys is "nonresponse." Does this concept apply to other sources of data? Certainly other sources of data having missing data. But nonresponse is a specific mechanism where we sample a unit and then request data, but the unit fails to supply the data. How does this concept apply to other sources of data? I wouldn't say that Twitter data suffer from "nonresponse" due to the fact that not everyone has a Twitter account or even that not every...

Surveys and Other Sources of Data

Linking surveys and other sources of data is not a new idea. This has been around for a long time. It's useful in many situations. For example, when respondents would have a difficult time supplying the information (for example, exact income information). Much of the previous research on linkage has focused on either the ability to link data, possibly in a probabilistic fashion; or there have been examinations of biases associated with the willingness to consent to linkage. It seems that new questions are emerging with the pervasiveness of data generated by devices, especially smart phones. I read an interesting article by Melanie Revilla and colleagues about trying to collect data from a tracking application that people install on their devices. They examine how the "meter" as they call the application might be incompletely covering the sample. For example, persons might have multiple devices and only install it on some of them. Or, persons might share devices and no...

Web surveys: Coverage or nonresponse error?

I've been reading a bit on mixed-mode surveys. I've noticed several discussions of web surveys and coverage error. This is a relatively recent mode, and one of the key issues has been to what extent the population has access to the internet. If someone doesn't have access to the internet, they can't complete a web survey. Everyone agrees upon that. But how do we describe the source of this error? Is it coverage or nonresponse error? In my mind, coverage error is a property of the sampling frame. If the unit is not on the sampling frame, then it is not covered. But many web surveys are general population surveys that don't have a tight association with a frame. That is, since there is not "internet" sampling frame in the way we have RDD or area probability samples. Many surveys start today from ABS sampling and then might do telephone, mail, web, or mixed-mode designs. In this case, a lack of internet access is an impediment to responding and not an imp...