Skip to main content

Adjusting with Weights... or Adjusting the Data Collection?

I just got back from JSM where I saw some presentations on responsive/adaptive design. The discussant did a great job summarizing the issues. He raised one of the key questions that always seems to come up for these kinds of designs: If you have those data available for all the cases, why bother changing the data collection when you can just use nonresponse adjustments to account for differences along those dimensions?

This is a big question for these methods. I think there are at least two responses (let me know if you have others).

First, in order for those nonresponse adjustments to be effective, and assuming that we will use weighting cell adjustments (the idea extends easily to propensity modeling), the respondents within any cell need to be equivalent to a random sample of the cell. That is, the respondents and nonrespondents need to have the same mean for the survey variable. A question might be, at what point does that assumption become true? Of course, we don't know. But if we alter our data collection strategy, we will at least strive to empirically verify that over some range of response rates.

Second, this is an empirical question. It would be nice to have studies that looked at this question. Does balancing response along specified dimensions lead to reduced nonresponse bias after adjustment? My hunch is "yes, it does."

Comments

  1. Hi James,
    Not an expert on this, but wouldn't you say that with nonresponse adjustments, you expect the transformation to be linear (or at least parametric), whereas if you adjust the data collection, adjustments can take any form. I agree with you that this may or may not matter in practice, and that that's an empirical question.

    ReplyDelete
  2. Peter, Thanks for the interesting comment. It seems to me that when we make our nonresponse adjustments, after all is said and done, we postulate that our model is correct and that the adjusted measures are unbiased (or as unbiased as can be given the available data). Normally, this assumption can't be tested. If there is an unobserved covariate that is making that assumption invalid, we can't know it.

    When we are collecting data, it is as if we assume that we are breaking (or maybe reducing) the correlation between that covariate and response, conditional on all the observed data.

    My hunch is that trying to break the correlation during data collection is worthwhile. Sadly, in most cases, we may never know. Today we lean pretty heavily on the Pew studies first reported in 2000 as evidence that response rate may not be a good indicator. It would be nice to have similar empirical studies on this question.

    ReplyDelete
  3. I entirely agree, both about reducing the potential for bias during data collection and about the unfortunate lack of evidence.

    To me, the argument for reducing rather than only adjusting using the same available information is one about robustness of the design with respect to nonresponse bias. Relying only on adjustments leaves greater potential for nonresponse bias due to heterogeniety within adjustment cells/groups. Doing something to equalize response rates across cells should reduce this risk. I think this is the same as what you said, stated in a slightly different way.

    One of the challenges with the empirical evidence is that there may not be substantial bias in weighted estimates to begin with, in which case one would erroneously conclude that only adjusting is fine. Yet it is about robustness - reducing the risk of bias, when there is bias...

    ReplyDelete
    Replies
    1. I think you are right that it is more robust to control the data collection in this way.

      I also think it may be the case that we can reduce the variability induced by interviewers by giving more centralized direction. Just conjecture...

      Delete

Post a Comment

Popular posts from this blog

Assessment of Maching Learning Classifiers

I heard another interesting episode of the Data Skeptic podcast . They were discussing how a classifier could be assessed (episode 121). Many machine learning models are so complex that a human being can't really interpret the meaning of the model. This can lead to problems. They gave an example of a problem where they had a bunch of posts from two discussion boards. One was atheist and the other board was composed of Christians. They tried to classify each post as being from one or the other board. There was one poster who posted heavily on the Christian board. His name was Keith. Sadly, the model learned that if the person who was posting was named Keith, then they were Christian. The problem is that this isn't very useful for prediction. It's an artifact of the input data. Even cross-validation would eliminate this problem. A human being can see the issue, but a model can't. In any event, the proposed solution was to build interpretable models in local areas of t...

Tailoring vs. Targeting

One of the chapters in a recent book on surveying hard-to-reach populations looks at "targeting and tailoring" survey designs. The chapter references this paper on the use of the terms among those who design health communication. I thought the article was an interesting one. They start by saying that "one way to classify message strategies like tailoring is by the level of specificity with which characteristics of the target audience are reflected in the the communication." That made sense. There is likely a continuum of specificity ranging from complete non-differentiation across units to nearly individualized. But then the authors break that continuum and try to define a "fundamental" difference between tailoring and targeting. They say targeting is for some subgroup while tailoring is to the characteristics of the individual. That sounds good, but at least for surveys, I'm not sure the distinction holds. In survey design, what would constitute ...

What is Data Quality, and How to Enhance it in Research

  We often talk about “data quality” or “data integrity” when we are discussing the collection or analysis of one type of data or another. Yet, the definition of these terms might be unclear, or they may vary across different contexts. In any event, the terms are somewhat abstract -- which can make it difficult, in practice, to improve. That is, we need to know what we are describing with those terms, before we can improve them. Over the last two years, we have been developing a course on   Total Data Quality , soon to be available on Coursera. We start from an error classification scheme adopted by survey methodology many years ago. Known as the “Total Survey Error” perspective, it focuses on the classification of errors into measurement and representation dimensions. One goal of our course is to expand this classification scheme from survey data to other types of data. The figure shows the classification scheme as we have modified it to include both survey data and organic f...