Skip to main content

Posts

Showing posts from March, 2013

Costs of Paradata... Analysis

One of the hidden costs of paradata are the time spent analyzing these data. Here, we've spent a lot of time trying to find standard ways to convert these data into useful information. But many times, we end up doing specialized analyses. Searching for an explanation of some issue. And, sometimes, this analysis doesn't lead to clear-cut answers. In any event, paradata aren't just collected, they are also managed and analyzed. So there are costs for generating information from these data. We could probably think of this in a total survey error perspective. "Does this analysis reduce total error more than increasing the number of interviews?" In practice, such a question is difficult to answer. What is the value of the analysis we never did? And how much would it have cost? There might be two extreme policies in this regard. One is "paralysis by analysis." Continually seeking information and delaying decisions. The other extreme is "flying by the

Costs of Paradata, Again

Still on this topic... We looked at the average time to complete a set of questions. These actions may be repeated many times (each time we have contact with a sampled household), but it still amounts to a trivial portion of total interviewer time (about .4%). They don't have to add much value to justify those costs. On the other hand, there are still a couple of questions. 1) Could we reduce measurement error on these questions if we spent more time on them? Brady West has looked at ways to improve these observations. If a few seconds isn't enough time, would more time improve the measurements? My hunch is that more time would improve the observations, but it would have other consequences. Which leads me to my second question: 2) Do these observations interfere with other parts of the survey process? For example, can they distract interviewers from the task of convincing sampled persons to do the interview?  My hunch on the latter question is that it is possible, but our

Cost of Paradata

I'm interested in this question again. I wrote about the costs of paradata a while ago. These costs can vary quite a lot depending upon the situation. There aren't a lot of data out there about these costs. It might be good to start looking at this question. One big question is interviewer observations. The technical systems that we use here have some limitations. Our sample management system doesn't create "keystroke files" that would allow us to determine how long call records take. But when we use our CAPI software, we can capture those data. Such timing data will allow us to answer the question about how much time it takes to create them (a key element of their costs). But they won't allow us to answer questions about how collecting those data impacts other parts of the process. For instance, does having interviewers create these data distract them from the conversation with sampled persons sufficiently to reduce response rates? The latter question pr

Undercoverage issues

I recently read an article by Tourangeau, Kreuter, and Eckman on undercoverage in screening surveys. One of several experiments on which they report explores how the form of the screening questions can impact eligibility rates. They compare taking a full household roster to asking if there is anyone within the eligible age range. The latter produces lower eligibility rates. There was a panel at JSM years ago that discussed this issue. Several major screening surveys reported similar undercoverage issues. Certainly the form of the question makes a difference. But even on screening surveys that use full household rostering, there can be undercoverage. I'm wondering what the mechanism is. If the survey doesn't advertise the eligibility criteria, how is that some sampled units avoid being identified as eligible? This might be a relatively small source of error in the survey, but it is an interesting puzzle.

Adaptive Design Research

I recently found a paper by some colleagues from VU University in Amsterdam and Statistics Netherlands. The paper uses dynamic programming to idea an optimal "treatment regime" for a survey. The treatment is the sequence of modes by which each sampled case is contacted for interview. The paper is titled "Optimal resource allocation in survey designs" and is in the European Journal of Operational Research . I'm pointing it out here since survey methods folks might not  follow this journal. I'm really interested in this approach as the methods they use seem to be well-suited for the complex problems we face in survey design. Greenberg and Stokes and possibly Bollapragada and Nair are the only other examples that do anything similar to this in surveys. I'm hoping that these methods will be used more widely for surveys. Of course, there is a lot of experimentation to be done.