Survey Methods Musings

Posts

Showing posts with the label Machine Learning

Learning from paradata

Susan Murphy's work on dynamic treatment regimes had a big impact on me as I was working on my dissertation. I was very excited about the prospect of learning from the paradata. I did a lot of work on trying to identify the best next step based on analysis of the history of a case. Two examples were 1) choosing the lag before the next call and the incentive, and 2) the timing of the next call. At this point, I'm a little less sure of the utility of the approach for those settings. In those settings, where I was looking at call record paradata, I think the paradata are not at all correlated with most survey outcomes. So it's difficult to identify strategies that will do anything but improve efficiency. That is, changes in strategies based on analysis of call records aren't very likely to change estimates. Still, I think there are some areas where the dynamic treatment regime approach can be useful. The first is mode switching. Modes are powerful, and offering them i...

Every Hard-to-Interview Respondent is Difficult in their Own Way...

The title of this post is a paraphrase of a saying coined by Tolstoi. " Happy families are all alike; every unhappy family is unhappy in its own way." I'm stealing the concept to think about survey respondents. To simplify discussion, I'll focus on two extremes. Some people are easy respondents. No matter what we do, no matter how poorly conceived, they will respond. Other people are difficult respondents. I would argue that these latter respondents are heterogenous with respect to the impact of different survey designs on them. That is, they might be more likely to respond under one design relative to another. Further, the most effective design will vary from person to person within this difficult group. It sounds simple enough, but we don't often carry this idea into practice. For example, we often estimate a single response propensity, label a subset with low estimated propensities as difficult, and then give them all some extra thing (often more money). ...

Goodhart's Law

I enjoy listening to the data skeptic podcast. It's a data science view of statistics, machine learning, etc. They recently discussed Goodhart's Law on the podcast. Goodhart's was an economist. The law that bears his name says that "when a measure becomes a target, then it ceases to be a good measure." People try and find a way to "game" the situation. They maximize the indicator but produce poor quality on other dimensions as a consequence. The classic example is a rat reduction program implemented by a government. They want to motivate the population to destroy rats, so they offer a fee for each rat that is killed. Rather than turn in the rat's body, they just ask for the tail. As a result, some persons decide to breed rats and cut off their tails. The end result... more rats. I have some mixed feelings about this issue. There are many optimization procedures that require some single measure which can be either maximized or minimized. I think th...

Training for Paradata

Paradata are messy data. I've been working with paradata for a number of years, and find that there are all kinds of issues. The data aren't always designed with the analyst in mind. They are usually a by-product of a process. The interviewers aren't focused (and rightly so) on generating high-quality paradata. In many situations, they sacrifice the quality of the paradata in order to obtain an interview. The good thing about paradata is that analysis of paradata is usually done in order to inform specific decisions. How should we design the next survey? What is the problem with this survey? The analysis is effective if the decisions seem correct in retrospect. That is, if the predictions generated by the analysis lead to good decisions. If students were interested in learning about paradata analysis, then I would suggest that they gain exposure to methods in statistics, machine learning, operations research, and an emerging category "data science." It seems ...

Myopic Calling Strategies

I'm interested in sequential decision-making problems.In these problems, there is a tension between exploration and exploitation. Exploitation is when you take actions with more certainty about the rewards. The goal of exploitation is to get maximum reward to the next action given what is currently known. Exploration is when you take actions with less certainty. The goal is to discover what the rewards are for actions about which little is known. A strategy that always exploits is called myopic since it always tries to maximize the reward of the current action without any view to long-term gains. Calling algorithms certainly face this tension. For example, evenings might be the best time on average to contact households. If I know nothing else, then that would be my guess about when to place the next call. But it would be foolish to stay with that option if it continues to fail. If I have failures in that call window, I might explore another call window to try and see if the re...

Timing of the Mode Switch

I just got back from JSM where I presented the results of an experiment that varied the timing of the mode switch in a web-telephone survey. I'm not going to talk about the results of the experiment in this post, just the premise. The concern that motivated the experiment had to do with the possibility that longer delays before switching modes could have adverse effects on response rates. This could happen for several reasons. If there is pre-notification, then the effect of the prenote on response to the second mode might be reduced with longer delays before switching. If the first mode is annoying in some way, it can diminish the effectiveness of the second mode. The latter case is particularly interesting to me. It points to the ways that different treatment sequences can have different levels of effectiveness. We saw an impact like this in an experiment we did of two sequences of modes for a screening survey. The two sequences functioned about the same in terms of respo...

Personalized Survey Design

In my last post, I talked about personalized medicine. I found out this week that in personalized medicine, there is a distinction between targeted and tailored treatments. Targeted treatments are aimed at specified subgroups of the population, while tailored protocols are individual-specific treatments that may be based in a targeted treatment, but use within-patient variation to "tune" treatments over time. I wonder if the kind of tailored protocols suggested by this kind of tailoring are possible for surveys? Panel surveys are one area where this may be possible. But it seems that the panel would have to have many waves or repetitions. There might not be enough measurement of variation with only a few waves. What's a few? Let's say fewer than 10 or 20. It seems like these methods might have an application in surveys that use frequent measurement and/or a relatively long period of time. For example, imagine a survey that collected data weekly for 2 or 3 years. O...

Responsive Design Phases

In Groves and Heeringa 's original formulation, responsive design proceeds in phases. They define these phases as: "A design phase is a time period of a data collection during which the same set of sampling frame, mode of data collection, sample design, recruitment protocols and measurement conditions are extant." (page 440). These responsive design phases are different than the two-phase sampling defined by Hansen and Hurwitz. Hansen and Hurwitz assumed 100% response so there was no nonresponse bias. There two-phase sampling was all about minimizing variance for a fixed budget. Groves and Heeringa, on the other hand, live in a world where nonresponse does occur. They seek to control it through phases that recruit complementary groups of respondents. The goal is that the nonresponse biases from each phase will cancel each other out. The focus on bias is a new feature relative to Hansen and Hurwitz. A question in my mind about the phases is how the...

Survey Methods Training

Survey Practice devoted the entire current issue to a discussion of training in survey methodology. This is a very useful review of what is currently done and suggestions for the future. As they observe, survey methodology is a broad discipline that draws upon a diverse set of fields of research. I expect that increasing this diversity would be positive. That is, there are a number of fields of study that would find applications for their methods in the field of survey research. A couple of key examples include operations research and computer science. Operations research could help us think more rigorously about designing data collection to optimize specified quantities. That doesn't mean we have to pursue one goal. But it would help, or maybe force us to quantify the vague trade offs we usually deal in. The paper by Greenberg and Stokes is an early example. The paper by Calinescu and colleagues is a recent one. Computer science is another such field. Researc...

Interviewer Travel and New Forms of Data

The Director of the Census Bureau, John Thompson, recently blogged about a field test for the 2020 Decennial Census Nonresponse Follow-up. They are testing a number of new features, including the use of smartphones in data collection. I've been working with GPS data from smartphones used by field interviewers. The data are complex, but may offer new insights into interviewer travel. Think of travel as a broad concept -- it's not just an expense or efficiency issue. The order in which calls are made may also relate to field outcomes like contact and response rates. Perhaps these GPS data can help us understand how interviewers currently make decisions about how to work their sample. For example, do they move past sampled housing units when they first arrive to the area segment? Is this action associated with higher contact rates? Of course, travel is also an expense or efficiency issue. I wouldn't want pushing for more efficient travel to interfere with other aspects o...

Identifying all the active components of the design...

I've been reading papers on email prenotification and reminders. They are very interesting. There are usually several important features for these emails: how many are sent, the lag between messages, the subject line, the content of the email (length etc.), the placement of the URL, etc. A full factorial design with all these factors is nearly impossible. So folks do the best they can and focus on a few of these features. I've been looking at papers on how many messages were sent, but I find that the lag time between message also varies a lot. It's hard to know which of these dimensions is the "active" component. It could be either, both, and may even be synergies (aka "interactions") between the two (and between other dimensions of the design as well). Linda Collins and colleagues talk about methods for identifying the "active components" of the treatments in these complex situations. Given the complexity of these designs, with a large nu...

Big Data and Survey Data

I missed Dr. Groves blog post on this topic. It is an interesting perspective on the strengths and weaknesses of each data source. His solution is to "blend" data from both sources to compensate for the weaknesses of each. Dr. Couper spoke along similar lines at the ESRA conference last year. An important takeaway from both of these is that surveys have an important place in the future. Surveys gather, relative to big data, rich data on individuals that allow the development and testing of models that may be used with big data. Or provide benchmarks for estimates from big data for which the characteristics of the population are only vaguely known. In any event, I'm not worried that surveys or even probability sampling have outlived their usefulness. But it is good to chart a course for the future that will keep survey folks relevant to these pressing problems.

Formalizing the Optimization Problem

I heard Andy Peytchev speak about responsive design recently. He raised some really good points. One of these was a "total survey error" kind of observation. He pointed out that different surveys have different objectives and that these may be ranked differently. One survey may prioritize sampling error while another has nonresponse bias as its biggest priority. As there are always tradeoffs between error sources, the priorities indicate which way those decisions were or will be made. Since responsive design has largely been thought of as a remedy for nonresponse bias, this idea seems novel. Of course, it is worth recalling that Groves and Heeringa did originally propose the idea in a total survey error perspective. On the other hand, many of their examples were related to nonresponse. I think it is important to 1) think about these tradeoffs in errors and costs, 2) explicitly state what they are for any given survey, and 3) formalize the tradeoffs. I'm not sure tha...

What would a randomized call timing experiment look like?

It's one thing to compare different call scheduling algorithms. You can compare two algorithms and measure the performance using whatever metrics you want to compare (efficiency, response rate, survey outcome variables). But what about comparing estimated contact propensities? There is an assumption often employed that these calls are randomly placed. This assumption allows us to predict what would happen under a diverse set of strategies -- e.g. placing calls at different times. Still, this had me wondering what a really randomized experiment would look like. The experiment would be best randomized sequentially as this can result in more efficient allocation. We'd then want to randomize each "important" aspect of the next treatment. This is where it gets messy. Here are two of these features: 1. Timing. The question is, how to define this. We can define it using "call windows." But even the creation of these windows requires assumptions... and tradeo...

Optimal Resource Allocation and Surveys

I just got back from Amsterdam where I heard the defense of a very interesting dissertation. You can find the full dissertation here . One of the chapters is already published and several others are forthcoming. The dissertation uses optimization techniques to design surveys that maximize the R-Indicator while controlling measurement error for a fixed budget. I find this to be very exciting research as it brings together two fields in new and interesting ways. I'm hoping that further research will be spurred by this work.

Were we already adaptive?

I spent a few posts cataloging design features that could be considered adaptive. No one labelled them that way in the past. But if we were already doing it, why do we need the new label? I think there are at least two answers to that: 1. Thinking about these features allows us to bring in the complexity of surveys. Surveys are multiple phase activities, where the actions at different phases may impact outcomes at later phases. This makes it difficult to design experiments. Clinical trials, some have labelled this phenomenon as " practice misalignments ." They note that trials that focus on single-phase, fixed-dose treatments are not well aligned with how doctors actually treat patients. The same thing may happen for surveys. When something doesn't work, we don't usually just give up. We try something else. 2. It gives us a concept to think about these practices. It is an organizing principle that can help identify common features, useful experimental me...

Again on Refusal Conversions

This isn't a technique that gets much attention. I can think of three articles on the topic. I know of one article (Fuse and Xie, 2007)that investigates refusal conversions in telephone surveys and collects information (observations) from interviewers. And I just googled another one (Beullens, et al., 2010) that investigates the effects of time between initial refusal and first converstion attempt. There is a third article (Burton, et al. 2006) on refusal conversions in panel studies. This one adds another element in that a key consideration is whether refusers that are converted will remain in the panel in subsequent waves. This problem seems to fit really well into the sequential decisionmaking framework. The decision is at which waves, for any given case that refuses, should you try a refusal conversion. You might, for instance, optimize the expected number of responses (completed interviews) over a certain number of waves. Or, you might maximize other measures of data qual...

Are Refusal Conversions "Adaptive?"

I have two feelings about talking about adaptive or responsive designs. The first feeling is that these are new concepts, so we need to invent new methods to implement them. The second feeling is that although these are new concepts, we can point to actual things that we have always (or for a long time) done and say, "that's an example of this new concept" that existed before the concept had been formalized. I think refusal conversions are a good example. We never really applied the same protocol to all cases. Some cases got a tailored or adaptive design feature. The rule is something like this: if the case refuses to complete the interview, then change the interviewer, make another attempt, and offer a higher incentive. I'm trying to think systematically about these kinds of examples. Some are trivial ("if there is no answer on the first call attempt, then make a second attempt"). But others may not be. The more of these we can root out, the more we can...

Adaptive Interventions

I was at a very interesting workshop today on adaptive interventions. Most of the folks at the workshop design interventions for chronic conditions and would be used to testing their interventions using a randomized trial. Much of the discussion was on heterogeneity of treatment effects. In fact, much of their research is based on the premise that individualized treatments should do better than giving everyone the same treatment. Of course, the average treatment might be the best course for everyone, but they have certainly found applications where this is not true. It seems that many more could be found. I started to think about applications in the survey realm. We do have the concept of tailoring , which began in our field with research into survey introductions. But do we use it much? I have two feelings on this question. No, there aren't many examples like the article I linked to above. We usually test interventions (design features like incentives, letters, etc.) on the wh...

Exploration vs exploitation

Once more on this theme that I discussed on this blog several times last year. This is a central problem for the field of research known as reinforcement learning. I'd recommend taking a look at Sutton and Barto' s book if you are interested. It's not too technical and can be understood by someone without a background in machine learning. As I mentioned in my last post, I think learning in the survey environment is a tough problem. The paper that proposed the upper confidence bound rule said it works well for short run problems -- but the short run they envisioned was something like 100 trials. In the survey setting, there aren't repeated rewards. We're usually looking for one interview. You might think of gaining contact as another reward, but still. We're usually limited to a relatively small number of attempts (trials). We also often have poor estimates of response and contact probabilities to start with. Given that reward structure, poor prior informatio...