Skip to main content

Posts

Showing posts from 2015

Bayesian Adaptive Survey Design

Just a short blog post. I recently attended the 4th Workshop on Adaptive and Responsive Survey Design . There were many good papers delivered at this workshop. There was a particular focus on Bayesian approaches to the estimation of survey design parameters or paradata modeling. The link has some of the slides and papers.

Mode Sequence

A few years ago, I did an experiment with two sequences of modes for a screening survey. The modes were mail and face-to-face. We found that the sequence didn't matter much for the response rate to the screener, but that the arm that started with face-to-face and then used mail had a better response rate to the main interview given to those who were found to be eligible in the screening interview. There are other experiments that use different sequences of modes. Some of these find that the sequence doesn't matter. For example, Dillman and colleagues looked at mail-telephone and telephone-mail and these had about the same response rate. On the other hand, Millar and Dillman found that for mail-web mixed-mode surveys the sequence does seem to matter, although certainly the number and kind of contact attempts are also important. It does seem that there are times when the early attempts might interfere with the effectiveness of later attempts. That is, we "harden the ref

Myopic Calling Strategies

I'm interested in sequential decision-making problems.In these problems, there is a tension between exploration and exploitation. Exploitation is when you take actions with more certainty about the rewards. The goal of exploitation is to get maximum reward to the next action given what is currently known. Exploration is when you take actions with less certainty. The goal is to discover what the rewards are for actions about which little is known. A strategy that always exploits is called myopic since it always tries to maximize the reward of the current action without any view to long-term gains. Calling algorithms certainly face this tension. For example, evenings might be the best time on average to contact households. If I know nothing else, then that would be my guess about when to place the next call. But it would be foolish to stay with that option if it continues to fail. If I have failures in that call window, I might explore another call window to try and see if the re

Web surveys: Coverage or nonresponse error?

I've been reading a bit on mixed-mode surveys. I've noticed several discussions of web surveys and coverage error. This is a relatively recent mode, and one of the key issues has been to what extent the population has access to the internet. If someone doesn't have access to the internet, they can't complete a web survey. Everyone agrees upon that. But how do we describe the source of this error? Is it coverage or nonresponse error? In my mind, coverage error is a property of the sampling frame. If the unit is not on the sampling frame, then it is not covered. But many web surveys are general population surveys that don't have a tight association with a frame. That is, since there is not "internet" sampling frame in the way we have RDD or area probability samples. Many surveys start today from ABS sampling and then might do telephone, mail, web, or mixed-mode designs. In this case, a lack of internet access is an impediment to responding and not an imp

Mixed Modes -- Don't forget the mixing parameter

I've been thinking about mixed-mode surveys a great deal over the last few months. And I notice that research publications tend to use a lot of shorthand to describe the approach -- e.g. "Mail-Telephone." Of course, they describe it in more detail, but the shorthand definition focuses on the modes. Since the shorthand describes the sequence, we end up comparing different sequences. But there are other important design features at play that make these comparisons tenuous. Of course, these other design parameters include the dosage of each mode in the sequence. Different dosages may result in different proportions of the interviews be conducted in each mode. For example, in the mail-telephone design, more mailings can increase the proportion of interviews in the mail mode. A recent article by Klausch, Schouten, and Hox includes a parameter for the mixture of modes \(\pi\). I'm concerned that we may do lots of experimentation to design a mixed mode survey that is con

Balancing response... without simply retreating

I've seen several studies that examine whether "balancing response" with respect to a set of covariates available on the frame can lead to reductions in nonresponse bias. Most of the studies indicate that more balanced response is associated with less nonresponse bias. However, there is a strategy for balancing response that worries me a bit -- reducing the response rates of the groups that have the highest response rates and, thereby, reducing the overall response rate. Why does this worry me? Several reasons. First, when does this work? We have some studies that show reductions in bias. The studies that show increases in bias might be suppressed due to publication bias. So, how are we supposed to know when it works and when it doesn't? Second, it's easy to reduce response rates. It's harder to raise them. What's worse, once we reduce response rates, how do we ever get back the skills required for obtaining higher response rates. Maybe we are simp

Timing of the Mode Switch

I just got back from JSM where I presented the results of an experiment that varied the timing of the mode switch in a web-telephone survey. I'm not going to talk about the results of the experiment in this post, just the premise. The concern that motivated the experiment had to do with the possibility that longer delays before switching modes could have adverse effects on response rates. This could happen for several reasons. If there is pre-notification, then the effect of the prenote on response to the second mode might be reduced with longer delays before switching.  If the first mode is annoying in some way, it can diminish the effectiveness of the second mode. The latter case is particularly interesting to me. It points to the ways that different treatment sequences can have different levels of effectiveness. We saw an impact like this in an experiment we did of two sequences of modes for a screening survey. The two sequences functioned about the same in terms of respo

"Call Scheduling Algorithms" = Call Scheduling Algoritms + Staffing

We think about call scheduling algorithms as a set of rules about when cases should be called. However, staffing is the other half of the problem. For the rules to be implemented, the staff making the calls need to be there. And, there can also be issues if the staff is too large. The rules need to account for both of these situations. Probably the more difficult problem is a staff that is too large. For example, imagine that all active cases have been called. There is an appointment in 45 minutes. The interviewer can wait, or call cases that have already been called on this shift. Calling cases again would be inefficient and a violation of a rule of the algorithm. Still, it seems bad to not make he calls. I wrote a paper on a call scheduling algorithm. I assigned a preferred calling window to each case. These windows changed over time as calls were placed and the results of previous calls were used to inform the assignment of preferred window. I spent a lot of time analyzing data

Attrition in Designs that use Frequent Measurement

I saw this paper recently that talked about how to measure and evaluate nonresponse to surveys that use short, frequently-administered instruments ("measurement-burst survey"). I've been working on a problem with data like these for a while. A complication was that the questionnaire changed based upon the intervals between measurements. For example, questions might begin, "Since you last completed this survey..." or "in the last two weeks..." depending upon the situation. Plus, panel members could choose to respond at different intervals, even though they were asked to respond at a specified interval. This made for a complex pattern of missing data. I ended up defining attrition in several ways.  The most useful was to lay out a grid over time. The survey was designed to be taken weekly, so I looked at each week over the time period to see if any reporting occured. This allowed me to how many cells in the grid were missing. But even that wasn'

Personalized Survey Design

In my last post, I talked about personalized medicine. I found out this week that in personalized medicine, there is a distinction between targeted and tailored treatments. Targeted treatments are aimed at specified subgroups of the population, while tailored protocols are individual-specific treatments that may be based in a targeted treatment, but use within-patient variation to "tune" treatments over time. I wonder if the kind of tailored protocols suggested by this kind of tailoring are possible for surveys? Panel surveys are one area where this may be possible. But it seems that the panel would have to have many waves or repetitions. There might not be enough measurement of variation with only a few waves. What's a few? Let's say fewer than 10 or 20. It seems like these methods might have an application in surveys that use frequent measurement and/or a relatively long period of time. For example, imagine a survey that collected data weekly for 2 or 3 years. O

From average response rate to personalized protocol

Survey methods research first efforts to understand nonresponse started by looking at response rates. The focus was on finding methods that raided response rates. This approach might be useful when everyone has response propensities close to the average. The deterministic formulation of nonresponse bias may even reflect this sort of assumption.  Researchers have since looked at subgroup response rates. Also interesting, but assuming that these rates are a fixed characteristic leaves us helpless.  Now, it seems that we have begun working with an assumprton that there is heterogenous response to treatments and that we should, therefore, tailor the protocol and manipulate response propensities.   I thought this development has a parallel in clinical trials where there is a new emphasis on personalized medicine.  We still have important questions to resolve. For example, what are we trying to maximize?

Is the "long survey" dead?

A colleague sent me a link to a blog arguing that the "long survey" is dead. The blog takes the point of view that anything over 20 minutes is long. There's also a link to another blog that presents data from survey monkey surveys showing that the longer the questionnaire, the less time that is spent on each question. They don't really control for question length, etc. But it's still suggestive. In my world 20 minutes is still a short survey. But the point is still taken. There has been some research on the effect of survey length (announced) on response rates. There probably is need for more. Still, it might be time to start thinking of alternatives to improve response to long surveys. The most common is to offer a higher incentive, and thereby counteract the burden of the longer survey. Another alternative is to shorten the survey. This doesn't work if your questions are the ones getting tossed. Of course, substituting big data for elements of surveys is

Selection Effects

This seems to come up in a number of different ways frequently. We talk a lot about nonresponse and how it may be a selective process such that it produces biases. We might try to model this process in order to correct these biases. Online panels and 'big data' like twitter have their own selection processes. It seems that it would be important to understand these processes. Can they be captured with simple demographics? If not, what else do we need to know? I think we have done some work on survey nonresponse. I'm not sure what is known about online panels or twitter relative to this question.

Adaptive Design in Panel Surveys

I enjoyed Peter Lugtig's blog post on using adaptive design in panel surveys. I was thinking about this again today. One of the things that I thought would be interesting to look at would be to view the problem of panel surveys as maximizing information gathered. I feel like we view panel studies as a series of cross-sectional studies where we want to maximize the response rate at each wave. This might create non-optimal designs. For instance, it might be more useful to have the first and the last waves measured, rather than the first and second waves. From an imputation perspective, in the latter situation (first and last waves) it is easier to impute the missing data. The problem of maximizing information across waves is more complicated than maximizing response at each wave. The former is a sequential decisionmaking problem, like those studies by Susan Murphy as "adaptive treatment regimes." It might be the case, that a lower response rate in early waves might lead

Responsive Design and Surveys with Short Time Frames

Another interesting question that I had during the webinar that I recently gave concerned responsive design and surveys with short time frames. I have to say, I mostly work on surveys with relatively long time frames. The shortest data collection that I have worked on in the last few years is about one month. That's not to say that I think responsive design is not relevant for surveys with short field periods. I think it is. If anything, following the prescribed regimen may be more important. A key aspect of responsive design, in my mind, is that the process is pre-planned. The indicators that are monitored, the decision rules for implementing interventions, the interventions, all have to be pre-planned. In a short survey, this is particularly important as their isn't time for developing ad hoc solutions. In a former life, I worked on surveys that had field periods of a day or two. In those studies, there wouldn't have been time to meet, discuss, and decide. Given the s

Responsive Design and Quota Sampling

I conducted a webinar on responsive design this week. I had several interesting questions. One of these was a question about responsive design and quota sampling.  The question was whether these two approaches are, in fact, different? Of course, there are similarities in that the response process is being controlled -- somewhat -- by the researchers. And this may lead to "allocating" nonresponse to some groups over others. For example, if some group is responding at higher rates, we might allocate resources to the lower responding group. Quota sampling will stop data collection for groups that have reached their quota. There are differences, however. Responsive design attempts to provide balanced response, but doesn't necessarily force that to happen. Further, responsive design is attempting to control the data collection process using a variety of approaches. Quota sampling only has one approach -- stop when the quota is full.  I do worry that there may be a conver

Responsive Design Definition

I've been getting ready to give a webinar on responsive design. I enjoy getting ready for this kind of talk as it gives me an opportunity to think about definitions and concepts. A few years ago, Mick Couper and I had a paper on "responsive vs adaptive" design. My thinking hasn't evolved much since that paper. In preparing the talk, I thought it might be helpful to define responsive design by contrast with ... that which is not responsive design. The contrasts were 1) pre-specified designs, and 2) ad hoc designs. The first category is a design where a pre-specified design is implemented and the results are pretty much as predicted. I personally haven't worked on many surveys like that, but I'm not yet ready to call it a "straw man." The second category is an approach I have seen in action. I sometimes call this approach "shooting from the hip." This is the situation where we start with a pre-specified design, but when it goes off the

Responsive Design Phases

In Groves and Heeringa 's original formulation, responsive design proceeds in phases. They define these phases as: "A design phase is a time period of a data collection during which the same set of sampling frame, mode of data collection, sample design, recruitment protocols and measurement conditions are extant." (page 440). These responsive design phases are different than the two-phase sampling defined by Hansen and Hurwitz. Hansen and Hurwitz assumed 100% response so there was no nonresponse bias.  There two-phase sampling was all about minimizing variance for a fixed budget.  Groves and Heeringa, on the other hand, live in a world where nonresponse does occur.  They seek to control it through phases that recruit complementary groups of respondents. The goal is that the nonresponse biases from each phase will cancel each other out. The focus on bias is a new feature relative to Hansen and Hurwitz.  A question in my mind about the phases is how the phas

What is Current Standard Practice for Surveys?

In clinical trials, they have the concept that there is an "existing standard of care." New treatments are compared experimentally to this treatment. I suppose that clinical trials have some issues where informed persons can disagree about the existing standard of care, but there is at least some consensus. I'm wondering what we have for existing standard practice in the administration of surveys? As I think about running experiments, the contrast is usually to the other thing we would normally do. But, that can be ill-defined. For instance, when running experiments in our telephone facility, it was difficult to describe current practice precisely as it involved expert knowledge of the managers adjusting parameters of the calling algorithm. As further evidence that it's difficult to precisely define the essential survey conditions, there are several articles on " house effects ," where the same survey with the same (rough?) specification ends up getting

Survey Methods Training

Survey Practice devoted the entire current issue to a discussion of training in survey methodology. This is a very useful review of what is currently done and suggestions for the future. As they observe, survey methodology is a broad discipline that draws upon a diverse set of fields of research. I expect that increasing this diversity would be positive. That is, there are a number of fields of study that would find applications for their methods in the field of survey research.  A couple of key examples include operations research and computer science. Operations research could help us think more rigorously about designing data collection to optimize specified quantities. That doesn't mean we have to pursue one goal. But it would help, or maybe force us to quantify the vague trade offs we usually deal in. The paper by Greenberg and Stokes is an early example. The paper by Calinescu and colleagues is a recent one.   Computer science is another such field. Researchers

Reflecting the Uncertainty in Design Parameters

I've been thinking about responsive design and uncertainty. I know that when we teach sample design, we often treat design parameters as if they were known. For example, if I do an optimal allocation for a stratified estimate, I assume that I know the population element variances for each stratum. The same thing could be said about response rates, which relate to the expected final sample size. Many years ago, the uncertainty might have been small about many of these parameters. But responsive design became a "thing" largely because this uncertainty seemed to be growing. The question then becomes, how do we acknowledge and even incorporate this uncertainty into our designs? Especially responsive designs. It seems that the Bayesian approach is a natural fit for this kind of problem. Although I can't find a copy online, I recall a paper that Kristen Olson and Trivellore Raghunathan presented at JSM in 2005. They suggested using a Bayesian approach to update estimate

Adaptive Designs and Incentives

I've been working on a paper about an incentive experiment that we did. It raised some interesting issues. And made me recall one of my favorite papers. Trussell and Lavrakas looked at incentives to a follow-up survey. They found that if someone had refused or been difficult to contact in the initial, screening survey, then a higher incentive was needed than for someone who had not refused or been difficult to contact. The incentives they recommend also differed by some demographic characteristics as well. I liked this example since the adaptation was linked, in part, to the paradata. These are the kinds of adaptations I have the most interest in. They require learning on the part of the survey organization that happens during data collection. I have the feeling that these kinds of adaptations can be particularly powerful since in models predicting response, it is often the case that paradata overwhelm the predictive power of demographic characteristics. There are all sorts of

Understanding "Randomly Selected"

I had the opportunity this morning to meet with a medical researcher who runs many clinical trials. He spoke about the problems of explaining randomization when enrolling persons in a trial. It's hard to be sure they understand the concept of randomization. To be sure, it's even more difficult to be sure they understand the consequences of either enrolling or not enrolling in a trial. But the problem of explaining randomization caught my attention. This reminds me of the situation that interviewers find themselves in quite frequently. In implementing random selection of a person from within a household, they often find that the person selected is someone other than the informant who aided with the selection. In these cases, the informant may be disappointed that they weren't selected and ask if they can do the interview instead. It's often difficult to explain why we want to speak to the other person, who is not there or maybe not even willing to do the interview. I

Margin of Error

There was a debate held yesterday on "Margin of Error" in the presence of nonresponse and using non-probability samples. This is an interesting and useful discussion. In the best of circumstances, "margin of error" represents the sampling error associated with an estimate. Unfortunately, other matters often... errrrr... always interfere. The sampling mechansim is not easily identified or modeled in the case of nonprobability samples. In the case of probability samples, the nonresponse mechanism has to be modeled. Either of these situations involve some model assumptions (untestable) that are required to motivate the estimation of a margin of error. One step forward would be for people who report estimated "margins of error" to reveal all of their assumptions in their  models (weighting models for nonresponse or, in the case of nonprobability samples, selection) and describe the sampling and recruitment mechanisms sufficiently such that others can evalu

Adaptive Design and Panel Surveys

I read this very interesting blog post by Peter Lugtig yesterday. The slides from the talk he describes are also linked to the post. He builds on an analysis of classes of nonresponders. Several distinct patterns of nonresponse are identified. The characteristics of persons in each class are then described. For example, some drop out early, some "lurk" around the survey, some stay more or less permanently. He suggests that it might be smart to identify design features that are effective for each of the groups and then tailor these features to the subgroups in an adaptive design. This makes a lot of sense. And panel studies are an attractive place to start doing this kind of work. In the panel setting, there is a lot more data available on cases. This can help in identifying subgroups. And, with repeated trials of the protocol, it may be possible to improve outcomes (response) over time. I think the hard part is creating the groups. This reminds me of a problem that I read

Mixed-Mode Surveys: Nonresponse and Measurement Errors

I've been away from the blog for a while, but I'm back. One of the things that I did during my hiatus from the blog was to read papers on mixed-mode surveys. In most of these surveys, there are nonresponse biases and measurement biases that vary across the modes. These errors are almost always confounded. An important exception is Olson's paper . In that paper, she had gold standard data that allowed her to look at both error sources. Absent those gold standard data, there are limits on what can be done. I read a number of interesting papers, but my main conclusion was that we need to make some assumptions in order to motivate any analysis. For example, one approach is to build nonresponse adjustments for each of the modes, and then argue that any differences remaining are measurement biases. Without such an assumption, not much can be said about either error source. Experimental designs certainly strengthen these assumptions, but do not completely unconfound the sources