Friday, November 21, 2014

Tiny Data...

I came across this interesting post about building a Bayesian model with careful specification of priors. The problem is that they have "tiny" data. So the priors play an important role in the analysis.

I liked this idea of "tiny" data. The rush to solve problems for "big data" has obscured the fact that are interesting problems for situations where you don't have much data.

Frost Hubbard and I looked at a related problem in a recently published article. We look at the problem of estimating response propensities during data collection. In the early part of the data collection, we don't have much data to estimate these models. As a result, we would like to use "prior" data from another study. However, this prior information needs to be well-matched to the current study -- i.e. have the same design features, at least approximately.

This doesn't always work. For example, I might have a new study with a different incentive than I've used before. How do I estimate the impact of that incentive early on? This is like the "tiny" data problem. It makes sense to try and formalize our prior information (in the case of a new design feature like the incentive example). This is usually a combination of expert opinion and literature review.

I would argue that turning this information into a formal Bayesian prior is useful for a couple of reasons. First, it gives us a way to learn whether our priors and methods for forming them are adequate. Second, it gives us a way to generalize our knowledge across surveys. Otherwise, the expert judgments aren't quantified in a way that others can easily use.

Friday, November 14, 2014

Interviewer Travel and New Forms of Data

The Director of the Census Bureau, John Thompson, recently blogged about a field test for the 2020 Decennial Census Nonresponse Follow-up. They are testing a number of new features, including the use of smartphones in data collection.

I've been working with GPS data from smartphones used by field interviewers. The data are complex, but may offer new insights into interviewer travel. Think of travel as a broad concept -- it's not just an expense or efficiency issue. The order in which calls are made may also relate to field outcomes like contact and response rates.

Perhaps these GPS data can help us understand how interviewers currently make decisions about how to work their sample. For example, do they move past sampled housing units when they first arrive to the area segment? Is this action associated with higher contact rates?

Of course, travel is also an expense or efficiency issue. I wouldn't want pushing for more efficient travel to interfere with other aspects of the process. For example, driving through an area segment might seem like inefficient travel. But if it improves outcomes, it actually increases efficiency.