Skip to main content

Posts

Showing posts from 2016

Responsive Survey Design Short Course

I don't do a whole lot of advertising on the blog, but I did want to post about a set of short courses that we will be offering here in Ann Arbor next summer. These courses are the first three days of what will eventually be a full two-week course. We have some great instructors lined up. We are going to teach techniques of responsive survey design that can be used across a variety of studies. If you are interested, follow this link for more information.

The Cost of a Call Attempt

We recently did an experiment with incentives on a face-to-face survey. As one aspect of the evaluation of the experiment, we looked at the costs associated with each treatment (i.e. different incentive amounts). The costs are a bit complicated to parse out. The incentive amount is easy, but the interviewer time is hard. Interviewers record their time for at the day level, not at the housing unit level. So it's difficult to determine how much a call attempt costs. Even if we had accurate data on the time spent making the call attempt, there would still be all the travel time from the interviewer's home to the area segment. If I could accurately calculate that, how would I spread it across the cost of call attempts? This might not matter if all I'm interested in is calculating the marginal cost of adding an attempt to a visit to an area segment. But if I want to evaluate a treatment -- like the incentive experiment -- I need to account for all the interviewer costs, as b

Methodology on the Margins

I'm thinking again about experiments that we run. Yes, they are usually messy. In my last post, I talked about the inherent messiness of survey experiments that is due to the fact that surveys have many design features to consider. And these features may interact in ways that mean we can't simply pull out an experiment on a single feature and generalize the result to other surveys. But I started thinking about other problems we have with experiments. I think another big issue is that methodological experiments are often run as "add-ons" to larger surveys. It's hard to obtain funding to run a survey just to do a methodological experiment. So, we add our experiments to existing surveys. The problem is that this approach usually creates a limitation. The experiments can't risk creating a problem for the survey. In other words, they can't lead to reductions in response rates or threaten other targets that are associated with the main (i.e. non-methodologic

Messy Experiments

I have this feeling that survey experiments are often very messy. Maybe it's just in comparison to the ideal type -- a laboratory with a completely controlled environment where only one variable is altered between two randomly assigned groups. But still, surveys have a very complicated structure. We often call this the "essential survey conditions." But that glib phrase might hide some important details. My concern is that when we focus on a single feature of a survey design, e.g. incentives, we might come to the wrong conclusion if we don't consider how that feature interacts with other design features. This matters when we attempt to generalize from published research to another situation. If we only focus on a single feature, we might come to the wrong conclusion. Take the well-known result -- incentives work! Except that the impact of incentives seems to be different for interviewer-administered surveys than for self-administered surveys. The other features of

Assessment of Maching Learning Classifiers

I heard another interesting episode of the Data Skeptic podcast . They were discussing how a classifier could be assessed (episode 121). Many machine learning models are so complex that a human being can't really interpret the meaning of the model. This can lead to problems. They gave an example of a problem where they had a bunch of posts from two discussion boards. One was atheist and the other board was composed of Christians. They tried to classify each post as being from one or the other board. There was one poster who posted heavily on the Christian board. His name was Keith. Sadly, the model learned that if the person who was posting was named Keith, then they were Christian. The problem is that this isn't very useful for prediction. It's an artifact of the input data. Even cross-validation would eliminate this problem. A human being can see the issue, but a model can't. In any event, the proposed solution was to build interpretable models in local areas of t

Tailoring vs. Targeting

One of the chapters in a recent book on surveying hard-to-reach populations looks at "targeting and tailoring" survey designs. The chapter references this paper on the use of the terms among those who design health communication. I thought the article was an interesting one. They start by saying that "one way to classify message strategies like tailoring is by the level of specificity with which characteristics of the target audience are reflected in the the communication." That made sense. There is likely a continuum of specificity ranging from complete non-differentiation across units to nearly individualized. But then the authors break that continuum and try to define a "fundamental" difference between tailoring and targeting. They say targeting is for some subgroup while tailoring is to the characteristics of the individual. That sounds good, but at least for surveys, I'm not sure the distinction holds. In survey design, what would constitute

Reasons for maintaining high response rates

A few years ago, I was presenting at a conference of substantive experts. I gave an update on a progress on a survey of interest to this group. I talked about how nonresponse bias can be complex, and that the response rate might not be a good predictor of when this bias occurs -- based on Groves and Peytcheva . I was speaking with one of the researchers after my presentation, and I was surprised to hear her say that she interpreted my comments to mean that "response rates don't matter." Although that interpretation makes sense, it hadn't really occurred to me in that way until she said it. Since then, it seems like we've seen a lot of published papers and conference presentation where lowering the response rate becomes a tactic for improving the survey. Most studies taking this tactic lower the response rates for groups that tend to respond at higher rates. The purported benefit is  response set balance on known characteristics from the sampling frame is improve

Combining surveys with other sources of data

The term "big data" was meant to cover a wide variety of types of data. Surveys were left out of the definition. Bob Groves attempted to remedy this by coining the terms "organic" and "designed" data. These terms were meant to capture the strengths and weaknesses of big data, on the one hand, and survey data, on the other hand. Organic data are not generated for research purposes, but usually inexpensive to obtain (but not necessarily cheap to analyze). Survey data are designed for research but are often expensive to obtain. I'm finding that these terms might get in the way of thinking about some actual problems. For instance, travel surveys are looking at combining survey data with GPS data. GPS data can be large and complex, i.e. "big data." On the other hand, features of these data are designed by the researchers in travel studies. That is, the researchers ask persons to carry a GPS device or download an app to their smartphones. These d

Goodhart's Law

I enjoy listening to the data skeptic podcast. It's a data science view of statistics, machine learning, etc. They recently discussed Goodhart's Law on the podcast. Goodhart's was an economist. The law that bears his name says that "when a measure becomes a target, then it ceases to be a good measure." People try and find a way to "game" the situation. They maximize the indicator but produce poor quality on other dimensions as a consequence. The classic example is a rat reduction program implemented by a government. They want to motivate the population to destroy rats, so they offer a fee for each rat that is killed. Rather than turn in the rat's body, they just ask for the tail. As a result, some persons decide to breed rats and cut off their tails. The end result... more rats. I have some mixed feelings about this issue. There are many optimization procedures that require some single measure which can be either maximized or minimized. I think th

Balancing Response through Reduced Response Rates

A case can be made that balanced response -- that is, achieving similar response rates across all the subgroups that can be defined using sampling frame and paradata -- will improve the quality of survey data. A paper that I was co-author on used simulation with real survey data to show that actions that improved the balance of response usually led to reduced bias in adjusted estimates. I believe the case is an empirical one. We need more studies to speak more generally about how and when this might be true. On the other hand, I worry that studies that seek balance by reducing response rates (for high-responding groups) might create some issues. I see two types of problems. First, low response rates are generally easier to achieve. It takes skills and effort to achieve high response rates. The ability to obtain high response rates, like any muscle, might be lost if it is not used. Second, if these studies justify the lower response rate by saying that estimates are not significantly

What is a "response propensity"?

We talk a lot about response propensities. I'm starting to think we actually create a lot of confusion for ourselves by the way we sometimes have these discussions. First, there is a distinction between an actual and an estimated propensity. This distinction is important as our models are almost always misspecified. It is probably the case that important predictors are never observed -- for example, the mental state of the sampled person at the moment that we happen to contact them. So that the estimated propensity and true propensity are different things. The model selection choices we make can, therefore, have something of an arbitrary flavor to them. I think the choices we make should depend on the purpose of the model. We examined in a recent paper on nonresponse weighting whether call record information, especially the number of calls and refusal indicators, were useful predictors of response propensities for this purpose. It turns out that these variables were strong predic

Survey Data and Big Data

I had an opportunity to revisit an article by Burns and colleagues that looks at using data from smartphones (they have a nice appendix of all the data they can get from each phone) to predict things that might trigger episodes of depression. Of course, the data don't contain any specific measures of depression. In order to get those, the researchers had to.... surveys. Once they had those, then they could find the associations with the censor data from the phone. Then they could deliver interventions through the phone. There are 38 sensors on the phone. The phone delivers data quite frequently. So even a small number of phones (n=8 in this trial) there was quite a large amount of data generated. A bigger trial would have even more data. So this seems like a big data application. And, in this case the "organic" data from the phone need some "designed" (i.e. survey) data in order to be useful. This is also interesting in that the smartphone is delivering a

Training for Paradata

Paradata are messy data. I've been working with paradata for a number of years, and find that there are all kinds of issues. The data aren't always designed with the analyst in mind. They are usually a by-product of a process. The interviewers aren't focused (and rightly so) on generating high-quality paradata. In many situations, they sacrifice the quality of the paradata in order to obtain an interview. The good thing about paradata is that analysis of paradata is usually done in order to inform specific decisions. How should we design the next survey? What is the problem with this survey? The analysis is effective if the decisions seem correct in retrospect. That is, if the predictions generated by the analysis lead to good decisions. If students were interested in learning about paradata analysis, then I would suggest that they gain exposure to methods in statistics, machine learning, operations research, and an emerging category "data science." It seems

WSS Mini-Conference on Paradata

Next week, after the big storm, the Washington Statistical Society is sponsoring a mini-conference: "Benefits and Challenges in Using Paradata." The program is available online. This will be a nice opportunity to meet and discuss with folks working on similar problems. We are few in number. It's good to take advantage of these opportunities. I'm going to be speaking about problems with working with incoming streams of paradata. I can propose some solutions, but we need to get better at this.