Friday, April 25, 2014

When should we use the term "nonresponse bias"?

Maybe I'm just being cranky, but I'm starting to think we need to be more careful about when we use the term "nonresponse bias." It's a simple term, right? What could be wrong here?

The situation that I'm thinking about is when we are comparing responders and nonresponders on characteristics that are known for everyone. This is a common technique. It's a good idea. Everyone should do this to evaluate the quality of the data.

My issue is when we start to describe the differences between responders and nonresponders on these characteristics as "nonresponse bias." These differences are really proxies for nonresponse bias. We know the value for every case, so there isn't any nonresponse bias.

The danger, as I see it, is that naive readers could miss that distinction. And I think it is an important distinction. If I say "I have found a method that reduces nonresponse bias," what will some folks hear? I think such a statement is probably too strong when I'm talking about differences between responders and nonresponders on known characteristics.

On the other hand, I was talking with some folks about this a couple of weeks ago. No one agreed with me on this point.

Friday, April 18, 2014

The Nonresponse-Measurement Error Nexus... in Reverse

I saw this very interesting post linking measurement error and nonresponse in a new way. Instead of looking at whether difficult to respond cases exhibit more measurement error, Peter Lugtig looks at whether cases with poor measurement attrit from a panel. If this works, these kinds of behaviors during the survey are a very useful tailoring variable. They can be signals of impending attrition.

One hypothesis about these cases is that they may not have sufficient commitment to the task. They do it poorly and opt out more quickly. The million dollar question is, how to we get them to commit to the task?

Friday, April 4, 2014

Defining Phases, Again

The other thing I should have mentioned in my last post is the level at which the phase is defined. We tend to think of Phases as points in time for area probability phases. This is because in a cluster sample, we want to save on travel. Taking a subsample of cases within a cluster doesn't save on travel. So, we tend to use time to find the point at which sampling could occur.

But we could trigger these decisions using some other criteria. A few years ago, I tried to develop a model that detected when there was a change in the cost structure -- that is, when costs go up. The problem was that the model couldn't detect the change until a few days later. Sometimes, it never detected it at all. Still, I like the idea of dynamically detecting the boundary of the phases.

Followers