A few years ago, I ran several experiments with a new call-scheduling algorithm. You can read about it here. I had to classify cases based upon which call window would be the best one for contacting them. I had four call windows. I ranked them in order, for each sampled number, from best to worst probability of contact.
The model was estimated using data from prior waves of the survey (cross-sectional samples) and the current data. For a paper that will be coming out soon, I looked at how often these classifications changed when you used the final data compared to the interim data. The following table shows the difference in the two rankings:
It looks like the rankings didn't change much. 85% were the same. 14% changed one rank.
What is difficult to know is what difference these classification errors might make in the outcomes of experiments. For this example, the final model would never be available for use in the experiment. Still, it would be nice to have some assessment of likely harm from these misclassification errors.
The model was estimated using data from prior waves of the survey (cross-sectional samples) and the current data. For a paper that will be coming out soon, I looked at how often these classifications changed when you used the final data compared to the interim data. The following table shows the difference in the two rankings:
Change In Ranking
|
Percent
|
0
|
84.5
|
1
|
14.1
|
2
|
1.4
|
3
|
0.1
|
It looks like the rankings didn't change much. 85% were the same. 14% changed one rank.
What is difficult to know is what difference these classification errors might make in the outcomes of experiments. For this example, the final model would never be available for use in the experiment. Still, it would be nice to have some assessment of likely harm from these misclassification errors.
Comments
Post a Comment