Survey Methods Musings

Posts

Showing posts with the label Contact Strategy

Survey Modes and Recruitment

I've been struggling with the concept of "mode preference." It's a term we use to describe the idea that respondents might have preferences for a mode and that if we can identify or predict those preferences, then we can design a better survey (i.e. by giving people their preferred mode). In practice, I worry that people don't actually prefer modes. If you ask people what mode they might prefer, they usually say the mode in which the question is asked. In other settings, the response to that sort of question is only weakly predictive of actual behavior. I'm not sure the distinction between stated and revealed preferences is going to advance the discussion much either. The problem is that the language builds in an assumption that people actually have a preference. Most people don't think about survey modes. Most don't consider modes abstractly in the way methodologists might. In fact, these choices are likely probabilistic functions that hinge on ...

Centralization vs Local Control in Face-to-Face Surveys

A key question that face-to-face surveys must answer is how to balance local control against the need for centralized direction. This is an interesting issue to me. I've worked on face-to-face surveys for a long time now, and I have had discussion about this issue with many people. "Local control" means that interviewers make the key decisions about which cases to call and when to call them. They have local knowledge that helps them to optimize these decisions. For example. if they see people at home, they know that is a good time to make an attempts. They learn people's work schedules, etc. This has been the traditional practice. This may be because before computers, there was no other option. The "centralized" approach says that the central office can summarize the data across many call attempts, cases, and interviewers and come up with an optimal policy. This centralized control might serve some quality purpose, as in our efforts here to promote more...

Every Hard-to-Interview Respondent is Difficult in their Own Way...

The title of this post is a paraphrase of a saying coined by Tolstoi. " Happy families are all alike; every unhappy family is unhappy in its own way." I'm stealing the concept to think about survey respondents. To simplify discussion, I'll focus on two extremes. Some people are easy respondents. No matter what we do, no matter how poorly conceived, they will respond. Other people are difficult respondents. I would argue that these latter respondents are heterogenous with respect to the impact of different survey designs on them. That is, they might be more likely to respond under one design relative to another. Further, the most effective design will vary from person to person within this difficult group. It sounds simple enough, but we don't often carry this idea into practice. For example, we often estimate a single response propensity, label a subset with low estimated propensities as difficult, and then give them all some extra thing (often more money). ...

What is the right periodicity?

It seems that intensive measurement is on the rise. There are a number of different kinds of things that are difficult to recall sufficiently over longer periods of time where it might be preferred to ask the question more frequently with a shorter reference period. For example, the number of alcoholic drinks consumed by day. More accurate measurements might be achieved if the questions was asked daily about the previous 24 hour period. But what is the right period of time? And how do you determine that? This might be an interesting question. The studies I've seen tend to guess at what the correct periodicity is. I think it's probably the case that it would require some experimentation to determine that, including experimentation in the lab. There are a couple of interesting wrinkles to this problem. 1. How do you set the periodicity when you measure several things that might have different periodicity? Ask the questions at the most frequent periodicity? 2. How does non...

"Call Scheduling Algorithms" = Call Scheduling Algoritms + Staffing

We think about call scheduling algorithms as a set of rules about when cases should be called. However, staffing is the other half of the problem. For the rules to be implemented, the staff making the calls need to be there. And, there can also be issues if the staff is too large. The rules need to account for both of these situations. Probably the more difficult problem is a staff that is too large. For example, imagine that all active cases have been called. There is an appointment in 45 minutes. The interviewer can wait, or call cases that have already been called on this shift. Calling cases again would be inefficient and a violation of a rule of the algorithm. Still, it seems bad to not make he calls. I wrote a paper on a call scheduling algorithm. I assigned a preferred calling window to each case. These windows changed over time as calls were placed and the results of previous calls were used to inform the assignment of preferred window. I spent a lot of time analyzing data ...

Classification Problems with Daily Estimates of Propensity Models

A few years ago, I ran several experiments with a new call-scheduling algorithm. You can read about it here . I had to classify cases based upon which call window would be the best one for contacting them. I had four call windows. I ranked them in order, for each sampled number, from best to worst probability of contact. The model was estimated using data from prior waves of the survey (cross-sectional samples) and the current data. For a paper that will be coming out soon, I looked at how often these classifications changed when you used the final data compared to the interim data. The following table shows the difference in the two rankings: Change In Ranking Percent 0 84.5 1 14.1 2 1.4 3 0.1 It looks like the rankings didn't change much. 85% were the same. 14% changed one rank. What is difficult to know is what difference these classification errors might make in the o...

"Failed" Experiments

I ran an experiment a few years ago that failed. I mentioned it in my last blog post. I reported on it in a chapter in the book on paradata that Frauke edited. For the experiment, I offered a recommended call time to interviewers. The recommendations were delivered for a random half of each interviewer's sample. They followed the recommendations at about the same rate whether they saw them or not (20% compliance). So, basically, they didn't follow the recommendations. In debriefings, interviewers said "we call every case every time, so the recommendations at the housing unit were a waste of time." This made sense, but it also raised more questions for me. My first question was, why don't the call records show that? Either they exaggerated when they said they call "every" case every time. Or, there is underreporting of calls. Or both. At that point, using GPS data seemed like a good when to investigate this question. Once we started examining the GP...

Setting an Appointment for Sampled Units... Without their Assent

Kreuter, Mercer, and Hicks have an interesting article in JSSAM. In a panel study, the Medical Expenditure Panel Survey (MEPS). They note my failed attempt to deliver recommended calling times to interviewers. They had a nifty idea... preload the best time to call as an appointment. Letters were sent to the panel members announcing the appointment. Good news. This method improved efficiency without harming response rates. There was some worry that setting appointments without consulting the panel members would turn them off, but that didn't happen. It does remind me of another failed experiment I did a few years ago. Well, there wasn't an experiment, just a design change. We decided that it would be good to leave answering machine messages on the first telephone call in an RDD sample. In the message, we promised that we would call back the next evening at a specified time. Like an appointment. Without experimental evidence, it's hard to say, but it did seem to increase...

Costs of Face-to-Face Call Attempts

I've been working on an experiment where evaluating cost savings is an important outcome. It's difficult to measure costs in this environment. Timesheets and call records are recorded separately. It's difficult to parse out the travel time from other time. One study actually shadowed a subset of interviewers in order to generate more accurate cost estimates. This is an expensive means to evaluate costs that may not be practical in many situations. It might be that increasing computerization does away with this problem. In a telephone facility, everything is timestamped so we can calculate how long most call attempts take. It might be that we will be able to do this in face-to-face studies soon/already.

What would a randomized call timing experiment look like?

It's one thing to compare different call scheduling algorithms. You can compare two algorithms and measure the performance using whatever metrics you want to compare (efficiency, response rate, survey outcome variables). But what about comparing estimated contact propensities? There is an assumption often employed that these calls are randomly placed. This assumption allows us to predict what would happen under a diverse set of strategies -- e.g. placing calls at different times. Still, this had me wondering what a really randomized experiment would look like. The experiment would be best randomized sequentially as this can result in more efficient allocation. We'd then want to randomize each "important" aspect of the next treatment. This is where it gets messy. Here are two of these features: 1. Timing. The question is, how to define this. We can define it using "call windows." But even the creation of these windows requires assumptions... and tradeo...

Equal Effort... or Equal Probabilities

I've been reading an article on locating respondents in a panel survey. The authors were trying to determine what the protocol should be. They reviewed the literature to see what the maximum number of calls should be. As I noted in my last post, I was recently involved in a series of discussions on the same topic. But when I was reading this article, I thought immediately about how much variation there is between call sequences with the same number of calls. The most extreme case is calling a case three times in one day is not the same as calling a case three times over the course of three weeks. I think the goal should be to apply protocols that have similar rates of being effective, i.e. produce similar response probabilities. But there aren't good metrics to measure the effectiveness of the many different possibilities. Practitioners need something that can evaluate how the chain of calls produce an overall probability of response. Using call-level estimates might be one...

Simulation of Limits

In my last post, I advocated against truncating effort. In this post, I'm going to talk about doing just that. Go figure. We were discussing call limits on a project that I'm working on. This is a study that we plan to repeat in the future, so we're spending a fair amount of time experimenting with design features on this first wave. There is a telephone component to the survey, so we've been working on the question of how to specify the calling algorithm and, in particular, what if any ceiling we should place on the number of calls. One way to look at it is to look at the distribution of final outcomes by call number -- sort of like a life table. Early calls are generally more productive (i.e. produce a final outcome) than late calls. You can look at the life table and see after which call very few interviews are obtained. You might truncate the effort at that point. The problem is that simulating what would happen if you place a ceiling on the number of calls ...

Do response propensities change with repeated calling?

I read a very interesting article by Mike Brick. The discussion of changing propensities in section 7 on pages 341-342 was particularly interesting. He discusses the interpretation of changes in average estimated response propensities over time. Is it due to changes in the composition of the active sample? Or, is it due to within-unit decreases in probability caused by repeated application of the same protocol (i.e. more calls)? To me, it seems evident that people's propensity to respond do change. We can increase a person's probability of response by offering an incentive. We can decrease another person's probability by saying "the wrong thing" during the survey introduction. But the article specifically discusses whether additional calls actually change the callee's probability of response. In most models, the number of calls is a very powerful predictor. Each additional call lowers the probability of response. Brick points out that there are two int...

Persuasion Letters

This is a highly tailored strategy. The idea is that certain kinds of interviewer observations about contact with sampled households will be used to tailor a letter that is sent to the household. For example, if someone in the household says they are "too busy" to complete the survey, a letter is sent that specifically addresses that concern. It's pretty clear that this is adaptive. But here again, thinking about it as an adaptive feature could improve a) our understanding of the technique, and b) -- at least potentially -- its performance. In practice, interviewers request that these letters be sent. There is variability in the rules they use about when to make that request. This could be good or bad. It might be good if they use all of the "data" that they have from their contacts with the household. That's more data than the central office has. On the other hand, it could be bad if interviewers vary in their ability to "correctly" identify c...

Sorry I missed you...

This is another post in a series on currently used survey design features that could be "relabeled" as adaptive. I think it is helpful to relabel for a couple of reasons. 1) It demonstrates a kind of feasibility of the approach, and 2) it would help us think more rigorously about these design options (for example, if we think about refusal conversions as a treatment within a sequence of treatments, we may design better experiments to test various ways of conducting conversions). The design feature I'm thinking of today has to do with a card that interviewers leave behind sometimes when no one is home at a face-to-face contact attempt. The card says "Sorry I missed you..." and explains the study and that we will be trying to contact them. Interviewers decide when to leave these cards. In team meetings with interviewers, I heard a lot of different strategies that interviewers use with these cards. For instance, one interviewer said she leaves them every time, ...

Are Call Limits Adaptive?

In the same vein as previous posts, I'm continuing to think about current practices that might be recast as adaptive. Call limits are a fairly common practice. But they are also, at least for projects that I have worked on, notoriously difficult to implement. For example, it may happen that when project targets for numbers of interviews are not being met, then these limits will be violated. We might even argue that since the timing of the calls is not always well regulated, that it is difficult to claim that cases have received equal treatments prior to reaching the limit. For example, three calls during the same hour is not likely to be as effective as three calls placed on different days and times of day. Yet they would both reach a three-call limit. [As an aside, it might make more sense to place a lower-limit on "next call" propensities estimated from models that include information about the timings of the call, as Kreuter and Kohler do here .] In any event, su...

Contact Strategies: Strategies for the Hard-to-Reach

One of the issues with looking at average contact rates (like with the heat map from a few posts ago) is that it's only helpful for average cases. In fact, some cases are easy to contact no matter what strategy you use, other cases are easy to contact when you try a reasonable strategy (i.e. calling during a window with an average high contact rate), but what is the best strategy for the hard-to-reach cases? I've proposed a solution that tries to estimate the best time to call using the accruing data. I know other algorithms might explore other options more quickly. For instance, choosing the window with the highest upper bound on a confidence interval. It might be interesting to try these approaches, particularly for studies that place limits on the number of calls that can be made. The lower the limit, the more exploration may pay off.

Call Windows as a Pattern

The paradata book , edited by Frauke Kreuter, is out! I have a chapter in the book on call scheduling. One of the problems that I mention is how to define call windows. The goal should be to create homogenous units. For example, I made the following heatmap that shows contact rates by hour for a face-to-face survey. The figure includes contact rates for all cases and for the subset of cases that were determined to be eligibile I used this heatmap to define contiguous call windows that were homogenous with respect to contact rates. I used ocular inspection to define the call windows. I think this could be improved. First, clustering techniques might produce more efficient results. I assumed that the call windows had to be contiguous, this might not be true. Second, along what dimension do we want these windows to be homogenous? Contact rates is really a proxy. We want them to be homogenous with respect to the results of next call on any case, or really our final goal of inter...

Adaptive Design Research

I recently found a paper by some colleagues from VU University in Amsterdam and Statistics Netherlands. The paper uses dynamic programming to idea an optimal "treatment regime" for a survey. The treatment is the sequence of modes by which each sampled case is contacted for interview. The paper is titled "Optimal resource allocation in survey designs" and is in the European Journal of Operational Research . I'm pointing it out here since survey methods folks might not follow this journal. I'm really interested in this approach as the methods they use seem to be well-suited for the complex problems we face in survey design. Greenberg and Stokes and possibly Bollapragada and Nair are the only other examples that do anything similar to this in surveys. I'm hoping that these methods will be used more widely for surveys. Of course, there is a lot of experimentation to be done.

Estimating Daily Contact Models in Real-Time

A couple of years ago I was running an experiment on a telephone survey. The results are described here . As part of the process, I estimated a multi-level logistic regression model on a daily basis. I had some concern that early estimates of the coefficients and resulting probabilities (which were the main interest) could be biased. The more easily interviewed cases are usually completed early in the field period. So the "sample" used for the estimate is disproportionately composed of easy responders. To mitigate the risk of this happening, I used data from prior waves of the survey (including early and late responders) when estimating the model. The estimates also controlled for level of effort (number of calls) by including all call records and estimating household-level contact rates. During the experiment I monitored the estimated coefficients on the daily basis. They were remarkably stable over time: Of course, nothing says it had to turn out this way. I have found...