Friday, January 29, 2016

Training for Paradata

Paradata are messy data. I've been working with paradata for a number of years, and find that there are all kinds of issues. The data aren't always designed with the analyst in mind. They are usually a by-product of a process. The interviewers aren't focused (and rightly so) on generating high-quality paradata. In many situations, they sacrifice the quality of the paradata in order to obtain an interview.

The good thing about paradata is that analysis of paradata is usually done in order to inform specific decisions. How should we design the next survey? What is the problem with this survey? The analysis is effective if the decisions seem correct in retrospect. That is, if the predictions generated by the analysis lead to good decisions.

If students were interested in learning about paradata analysis, then I would suggest that they gain exposure to methods in statistics, machine learning, operations research, and an emerging category "data science." It seems like exposure to methods from these areas would strengthen a persons ability to manage the messy data, find patterns in the data, and inform decisions based on the results. While we're certainly making strides forward in our ability to work with these data, a new generation with the right training will be able to carry it further.

Friday, January 22, 2016

WSS Mini-Conference on Paradata

Next week, after the big storm, the Washington Statistical Society is sponsoring a mini-conference: "Benefits and Challenges in Using Paradata."

The program is available online. This will be a nice opportunity to meet and discuss with folks working on similar problems. We are few in number. It's good to take advantage of these opportunities.

I'm going to be speaking about problems with working with incoming streams of paradata. I can propose some solutions, but we need to get better at this.