Tuesday, May 25, 2010

Seasonal effects

I've been running an experiment on a relatively small survey (300 RDD interviews per month). Since the survey is small, I need to run the experiment over many months to accumulate enough data.

One unintended consequence of this long field period for the experiment is that I observe fluctuations over the course of the year that may indicate seasonal effects. April is the most profound example. In every other month, the experimental method produced higher contact rates than the control. But not April. In April, the control group did better.

I have at least two hypotheses about why:

1. April is one of the toughest months for contacting households. Something about the experimental method interacts with seasonal effect to produce lower contact rates for the experimental method. Seems unlikely.

2. Sampling error. If you run the experiment in enough months, one of them will come up a loser. More likely.

Tuesday, May 18, 2010

Imputation of "e" as an extension of a survival model approach

The genesis of the idea for imputing "e" came from my process for estimating the fraction of missing information for an ongoing survey. I had to impute eligibility for cases each day so that I could impute survey values for the subset of eligible case (including those with imputed eligibility). I thought, "hey, I'm already imputing 'e.' I just need to set it up that way."

Along the way, I had to compare the method to the life table product-limit approach advocated by Brick et al. (POQ, 2002). I found a very nifty article by Efron (JASA, 1988) that compares life table methods to logistic regression. Essentially, for the discrete time case, the life table model produces the same results as if we had a logistic regression model with a dummy variable for each time point. Efron then paramterized the model with fewer parameters ($t$, $t^2$, and $t^3$), I believe, and shows how this compares to the life table product-limit nonparametric estimate.

This article helped me think about how my method related to the life table method and was a nice starting point for the analysis. I could start with a model that includes a dummy variable for every call number. Then I could simplify and use some transform of the number of calls (I chose the log transform) and even add parameters.