Survey Methods Musings

Posts

Showing posts with the label "e"

Imputation of "e" as an extension of a survival model approach

The genesis of the idea for imputing "e" came from my process for estimating the fraction of missing information for an ongoing survey. I had to impute eligibility for cases each day so that I could impute survey values for the subset of eligible case (including those with imputed eligibility). I thought, "hey, I'm already imputing 'e.' I just need to set it up that way." Along the way, I had to compare the method to the life table product-limit approach advocated by Brick et al. (POQ, 2002). I found a very nifty article by Efron (JASA, 1988) that compares life table methods to logistic regression. Essentially, for the discrete time case, the life table model produces the same results as if we had a logistic regression model with a dummy variable for each time point. Efron then paramterized the model with fewer parameters ($t$, $t^2$, and $t^3$), I believe, and shows how this compares to the life table product-limit nonparametric estimate. This artic...

Imputation of "e"

I'm finishing up a presentation that I'll be giving at AAPOR (Saturday, May 15th at 2:15) on using imputation methods to estimate "e." I posted on this topic a while ago. I wanted to post one of the graphics that I developed for that presentation. I start from a very simple model that predicts eligibility using the natural logarithm of the last call number as the only predictor. That generates the following distribution of imputed eligible cases. The blue line shows the eligibility rates for the cases for which the eligibility status is known. The blue dashed line shows the model (a logistic regression model predicting eligibility using the natural log of the call number) prediction of eligibility. The green line shows the eligibility for the cases where the eligibility flag is imputed. The green line is the line used to estimate "e."

Call Scheduling Issue

One of the issues that I'm facing in my experiment with call scheduling on the telephone survey is the decision to truncate effort. Typically, we have a policy that says something like call a case 12 times in 3 different call windows (6 in one, 4 in another, and 2 in the last). Those calls must occur on 12 different days. If those calls are made and none of them achieve contact (including an answering machine), we assume that further effort will not produce any result. We finalize the case as a Noncontact. We call this our "grid" procedure (since the paper coversheets that we use to use tracked the procedure in a grid). It counts against AAPOR RR2. A portion (the famous "e") of each such case counts against AAPOR RR4. My algorithm did not regard this algorithm. Assuming the model favored one window every day, then the requirements of the grid would never be met. It sounds to me like a failure to sufficiently explore other policies, but it could happen. In an...

More on imputing "e"...

I've actually already done a lot of work on imputing eligibility. For my dissertation, I used the fraction of missing information as a measure of data quality. I applied the measure to survey data collections. In order to use this measure, I had to impute for item and unit nonresponse (including the eligibility of cases that are not yet screened for eligibility). The surveys that I used both had low eligibility rates (one was an area probability sample with an eligibility rate of about 0.59 and the other was an RDD survey with many nonsample cases). As a result, I had to impute eligibility for this work. An article on this subject has been accepted by POQ and is forthcoming. The chart shown below uses data from the area probability survey. It shows the distribution of eligibility rates that incorporate imputations for the missing values. The eligiblility rate for the observed cases is the red line. The imputed estimates appear to be generally higher than the observed v...

How can we estimate "e"?

AAPOR defines response rates that include an adjustment factor for cases that have unknown eligibility at the end of the survey. They call the factor "e". Typically, people use the eligibility rate from the part of the sample where this variable (eligible=yes/no) is observed. This estimate is sometimes called the CASRO estimate of e. But in a telephone survey, this estimate of "e" is likely to be biased upwards for the unknown part of the sample. Many of the cases that are never contacted are not households. They are simply numbers that will ring when dialed, but are not assigned to a household. These cases are never involved in estimates of "e". A paper in POQ (Brick and Montaquila, 2002) described an alternative method of estimating e. They use a survival model. This lowers estimates of e relative to the CASRO method. But it's still upwardly biased since many of the noncontacts could never be contacted. I like the survival method since it's clos...