Friday, April 26, 2013

Sequentially estimated propensity models

We've been estimating response propensity models during data collection for a while. We have at least two reasons for doing this:
  1. We monitor average response probability for active cases.
  2. I uses estimates from these models to determine the next step in experiments.
There is some risk to estimating models in this way. Particularly for the second purpose. The data used to make the estimates is accumulating over time. And those data don't come in randomly -- the easiest cases come in early and the difficult cases tend to come in later.

If I'm interested in the average impact of adding an 8th call to active cases, I might get a different estimate early in the field period than later.

In practice, the impact of this isn't as severe as you might think and there are remedies. Which leads me to the self-promotion part of this post ... I'll be presenting on this topic at AAPOR this year.

Friday, April 19, 2013

New Paradata

The Journal of Official Statistics has a special issue on systems and architecture that looks very interesting. This is a very interesting topic. Many of the authors mention the phenomenon of "silos" or "stovepipes." This is the situation where production is organized our projects rather than tasks. This kind of organization can lead to multiple projects independently developing software tools to do the same thing.

I think this phenomenon also has an effect on the paradata. Since these silos are organized around projects, the opportunity to collect methodologically relevant paradata may be lost. The focus is on collecting the data for the project.

New systems do present an opportunity to develop new paradata. It seems like defining cross-project tasks and developing unified systems is the better option. Within that framework, it might be helpful to think of methodologists as performing a task and, therefore, include them in the design of new systems.

That's the selfish argument anyway. Of course, we can't forget about the costs of these data either.

Friday, April 12, 2013

More thoughts on the cost of paradata....

Matt Jans had some interesting thoughts on costs on his blog. I like the idea of small pilot tests. In fact, we do a lot of turning on and off of interviewer observations and other elements. In theory, this creates nearly experimental data that I have failed to analyze. My guess is that the amount of effort created by these few elements is too small to be detected given the sample sizes we have (n=20,000ish). That's good, right? The marginal cost of any observation is next to zero.

At a certain point, adding another observation will create a problem. It will be too much. Just like adding a little more metal to a ball bearing will transform it into a... lump of metal. Have we found that point yet?

Last week, we did find an observation that was timed using keystroke data. We will be taking a look at those data.

Friday, April 5, 2013

Responsive Design and Information

It seems odd to say, but "Responsive Design" has now been around for a while. Groves and Heeringa published their paper in 2006. The concept has probably been stretched in all directions at this point.

I find it helpful to back to the original problem statement: we design surveys as if we know the results of each design decision. For example, we know what the response rate will be given a certain design (mode, incentive, etc. -- the "essential conditions"). How would we act if we had no idea about the results? We would certainly expend some resources to gain some information.

Responsive design is built upon this idea. Fortunately, in most situations, we have some idea about what the results might be, at least within a certain range. We experiment within this range of design options in order to approach an optimal design. We expend resources relatively inefficiently in order to learn something that will improve the design of later phases.

I've seen people working in the area of Machine Learning addressing a similar problem. They have to address this question of the value of exploring design options ("policies," in their terminology). What is the value of exploring policies (i.e. gaining information) relative to maximizing the reward under known policies (using all your resources for the design that is assumed to be best)? It might be useful to approach responsive design from this perspective.

Followers