Skip to main content

Posts

Showing posts from 2009

Which protocol?

A new article by Peytchev, Baxter, and Carley-Baxter outlines reasoning for altering the survey protocol in midstream in order to bring in new types of respondents, as opposed to applying the same protocol and bringing in more of the same. Responsive design ( Groves and Heeringa , 2006) is built around a similar reasoning. I think it's probably not uncommon for survey organizations to use the same protocol over and over. It shouldn't be surprising that this approach generally brings in "more of the same." But if the response rate is the guiding metric, then such considerations aren't relevant. Under the response rate, it's not who you interview, but how many interviews you get. In other words, the composition of the respondent pool is irrelevant as long as you hit your response rate target. As the authors note, however, there is much more to be done in terms of determining the appropriate protocol for each particular situation -- assuming that simply maxi

Call Scheduling Issue

One of the issues that I'm facing in my experiment with call scheduling on the telephone survey is the decision to truncate effort. Typically, we have a policy that says something like call a case 12 times in 3 different call windows (6 in one, 4 in another, and 2 in the last). Those calls must occur on 12 different days. If those calls are made and none of them achieve contact (including an answering machine), we assume that further effort will not produce any result. We finalize the case as a Noncontact. We call this our "grid" procedure (since the paper coversheets that we use to use tracked the procedure in a grid). It counts against AAPOR RR2. A portion (the famous "e") of each such case counts against AAPOR RR4. My algorithm did not regard this algorithm. Assuming the model favored one window every day, then the requirements of the grid would never be met. It sounds to me like a failure to sufficiently explore other policies, but it could happen. In an

Stopping Rules for Surveys

One aspect of responsive design that hasn't really been considered is when to stop collecting data. Groves and Heeringa (2006) argue that you should change your data collection strategy when it ceases to bring in interviews that change your estimate. But when should you stop? It seems like the answer to the question should be driven by some estimate of the risk of nonresponse bias. But given that the response rate appears to be a poor proxy measure for this risk, what should we do? Rao, Glickman and Glynn proposed a rule for binary survey outcome variables. Now, Raghu and I have an article accepted at Statistics in Medicine that proposes a rule that uses imputation methods and recommends stopping data collection when the probability that additional data (i.e. more interviews) will change your estimate is sufficiently small. The rule is for normally distributed data. The rule is discussed in my dissertation as well.

Evaluating the Experiments in the Field

One problem that we face in evaluating the experiments in face-to-face surveys where the interviewer decides when to call, leave SIMY cards,etc. is that we don't know whether the interviewer followed our recommendation. Maybe they just happened to do the very thing we recommended without viewing our recommendation. I'm facing this problem with both the experiment involving SIMY card use and the call scheduling experiment. We could ask them if they followed the recommendation, but their answers are unlikely to be reliable. My current plan is to save the statistical recommendation for all cases (experimental vs control) and compare how often the recommendation is followed in both groups. In the control group, the recommendation is never revealed to the interviewer. If the recommendation is "followed" more in the group where it is revealed, then it appears that it did have an impact on the choices the interviewers made.

Does a SIMY card always help?

Although we don't have any evidence, our prior assumption seems to be that SIMY cards are generally helpful. Julia D'arrigo, Gabi Durrant, and Fiona Steele have a working paper that presents evidence from a multi-level multinomial model that these cards do improve contact rates. A further question that we'll be attempting to answer is whether we can differentiate among cases for which the SIMY card improves contact rates and those for which it does not. Why would a card hurt contact rates? It might be that for some households, the card acts as a warning and they work to avoid the interviewer. Or, they may feel that leaving the card was somehow inappropriate. We have anecdotal evidence on this score. In the models I've been building, I have found interactions between observable characteristics of the case (e.g. is it in a neighborhood with access impediments? Is it in a neighborhood with safety concerns?, etc.) that indicate that we may be able to differentiate our

More on R-Indicators

I mentioned the R-Indicators in a recent post. In addition to the article in Survey Methodology , they also have a very useful website . The website includes a number of papers and presentations on the topic.

More on imputing "e"...

I've actually already done a lot of work on imputing eligibility. For my dissertation, I used the fraction of missing information as a measure of data quality. I applied the measure to survey data collections. In order to use this measure, I had to impute for item and unit nonresponse (including the eligibility of cases that are not yet screened for eligibility). The surveys that I used both had low eligibility rates (one was an area probability sample with an eligibility rate of about 0.59 and the other was an RDD survey with many nonsample cases). As a result, I had to impute eligibility for this work. An article on this subject has been accepted by POQ and is forthcoming. The chart shown below uses data from the area probability survey. It shows the distribution of eligibility rates that incorporate imputations for the missing values. The eligiblility rate for the observed cases is the red line. The imputed estimates appear to be generally higher than the observed value.

Myopia Revisited

In a previous blog I talked about an experiment that I'm currently working. The experiment is testing a new method for scheduling calls. For technical reasons, only a portion of the calls were governed by the experimental method. Refusal conversion calls were not governed by the new method. The experiment had the odd result that although the new method increased efficiency for the calls governed by the algorithm, these gains were lost at a later step (i.e. during refusal conversion -- see the table for results).   This month, we resolved the technical issues (maybe we should have done this in the first place). Now we will be able to see if we can counteract this odd result. If not, then we'll have to assume either:  Improbable sampling error explains this Some odd interaction between the method and resistance/refusal I'm hoping this moves things in the "expected" fasion.

Sorry I missed you...

We've been using "Sorry I missed you" (SIMY) cards for many years in face-to-face surveys. We don't know that they work, but we keep using them anyway. I suspect that these cards are useful sometimes, not useful other times, and possibly harmful in some situations. We haven't really collected data on the use of these cards, but interviewers do usually say something in their call notes about the use of SIMY. I've been working with data based on these notes. I'm trying to identify cases where the SIMY is useful and where it may be harmful. We should be running some experiments with these cards in the near future. As with many of the experiments we've been running in face-to-face surveys, we have a double burden of proof. First, will interviewers respond to recommendations delivered via our computerized sample management system. Second, if they follow the recommendation, does it help. Hopefully, we'll have some evidence on one or both of these

Presenting Results: Adaptive Design for Telephone Surveys

I'll be presenting results from the adaptive call scheduling experiment on Monday, November 2nd at the FCSM Research Conference. The results were promising, at least for the calls governed by the experimental protocol. The following table summarizes the results: The next step is to extend the experimental protocol to the calls that were not involved with the experiment (mainly refusal conversion calls), and to attempt this with a face-to-face survey.

New Measures for Monitoring the Quality of Survey Data

Many surveys work to a sample size/response rate requirement. The contract specified the target response rate. The survey organization works hard to meet that target. In this context, the logical thing for the survey organization to do is to focus on interviewing the easiest cases to interview. The underlying assumption of this approach is that a higher response rate leads to a lower risk of bias. Theoretically, this need not be true. Empirically, there have been a number of recent studies where this is not true (see Groves and Peytcheva, POQ 2008). So what are we supposed to do? The search is on for alternative indicators. Bob Groves convened a meeting in Ann Arbor to discuss the issue two years ago. The result was a short list of indicators that might be used to evaluate the quality of survey data (see the October 2007 issue of Survey Practice : http://www.surveypractice.org/ ). Now these new measures are starting to appear! Barry Schouten, Fannie Cobben, and Jelke Bethlehem ha

How can we estimate "e"?

AAPOR defines response rates that include an adjustment factor for cases that have unknown eligibility at the end of the survey. They call the factor "e". Typically, people use the eligibility rate from the part of the sample where this variable (eligible=yes/no) is observed. This estimate is sometimes called the CASRO estimate of e. But in a telephone survey, this estimate of "e" is likely to be biased upwards for the unknown part of the sample. Many of the cases that are never contacted are not households. They are simply numbers that will ring when dialed, but are not assigned to a household. These cases are never involved in estimates of "e". A paper in POQ (Brick and Montaquila, 2002) described an alternative method of estimating e. They use a survival model. This lowers estimates of e relative to the CASRO method. But it's still upwardly biased since many of the noncontacts could never be contacted. I like the survival method since it's clos

Operationalizing Experimental Design

I had a useful conversation with project managers about the call scheduling experiment for a face-to-face survey. My proposal for an experiment had been to randomize at the line level. That way, interviewers would have both experimental and control cases in their sample. The project managers felt that this might lead to inefficient trips. In other words, interviewers might follow the recommendation and then ignore cases without a recommendation or go to cases that are very far apart in distance while not visiting cases that are closer but do not have a recommendation. The experiment is certainly more clean if the randomization occurs at the line level and not the interviewer level, but I certainly wouldn't want to create inefficiencies. One goal is to improve efficiency (another goal is to increase our ability to target cases). I thought training interviewers to use the recommendation as one piece of information while planning their trips. But maybe that wouldn't work. I'll

Calling Experiment for a Face-to-Face Survey?

I've been working on the experiment with calling strategies in a telephone survey. This was the obvious place to start since the call scheduling is done by a computerized algorithm. But I work on a lot of face-to-face surveys where the interviewer decides when to place a call. Other research has shown that interviewers are variable in their ability to successfully schedule calls. Can we help them with this problem? I'd like to try our calling experiment on a face-to-face survey. How? By delivering a a statistically-derived recommendation to the interviewer about when to call each sampled unit. On one face-to-face survey, we've successfully changed interviewer behavior by delivering recommendations about which cases to call first. I'm wondering if we can extend these results by suggesting specific times to call.

New Calling Experiment

Since the results of the experiment on call scheduling were good (with the experimental method having a slight edge over the current protocol), I've been allowed to test the experimental method against other contenders. The experimental method is described in a prior post. This month, I'm testing the experimental method which uses the predicted value for contact probabilities (MLE) across the four windows against another method which uses the Upper Confidence Bound (UCB) of the predicted probability. This quite often implies assigning a different window for calling than the experimental method. The UCB method is designed to attack your uncertainty about a case. Lai ("Adaptive Allocation and the Multi-Armed Bandit Problem," 1987) proposed the method. Other than the fact that our context (calling households to complete surveys) is a relatively short process (i.e. few pulls on the Mult-Armed Bandit), the multi-armed bandit analogy fits quite well. In my dissertation, I d

Mired in Myopia?

Reinforcement Learning (RL) deals with multi-step decision processes. One strategy for making decisions in a multi-step environment is always choose the option that maximizes your immediate payoff. In RL, they call this strategy "myopic" since it never looks beyond the payoff for the current action. The problem is that this strategy might produce a smaller total payoff at the end of the process. If we look at the process as a whole, we may identify a sequence of actions that produces a higher overall reward while not maximizing the reward for each individual action. This all relates to an experiment that I'm running on contact strategies. The experiment controls all calls other than appointments and refusal conversion attempts. The overall contact rate was 11.6% for the experimental protocol, and 9.0% for the control group. The difference is statistically significant. But establishing contact is only an intermediate outcome. The final outcome of this multi-step process is

An Experimental Adaptive Contact Strategy

I'm running an experiment on contact methods in a telephone survey. I'm going to present the results of the experiment at the FCSM conference in November. Here's the basic idea. Multi-level models are fit daily with the household being a grouping factor. The models provide household-specific estimates of the probability of contact for each of four call windows. The predictor variables in this model are the geographic context variables available for an RDD sample. Let $\mathbf{X_{ij}}$ denote a $k_j \times 1$ vector of demographic variables for the $i^{th}$ person and $j^{th}$ call. The data records are calls. There may be zero, one, or multiple calls to household in each window. The outcome variable is an indicator for whether contact was achieved on the call. This contact indicator is denoted $R_{ijl}$ for the $i^{th}$ person on the $j^{th}$ call to the $l^{th}$ window. Then for each of the four call windows denoted $l$, a separate model is fit where each household is assu

First Post

I'm setting up this blog in order to post about my ongoing research, as well as on ideas for future research. I'm hoping to blog weekly to start (Thursdays being a good day to post a blog). I'll start tomorrow.