I found this interesting article about using survey data to uncover missing data in big data. The big data are electronic medical records, which are cheap to analyze, but have important gaps. These folks used a survey to assess the gaps.
I heard another interesting episode of the Data Skeptic podcast . They were discussing how a classifier could be assessed (episode 121). Many machine learning models are so complex that a human being can't really interpret the meaning of the model. This can lead to problems. They gave an example of a problem where they had a bunch of posts from two discussion boards. One was atheist and the other board was composed of Christians. They tried to classify each post as being from one or the other board. There was one poster who posted heavily on the Christian board. His name was Keith. Sadly, the model learned that if the person who was posting was named Keith, then they were Christian. The problem is that this isn't very useful for prediction. It's an artifact of the input data. Even cross-validation would eliminate this problem. A human being can see the issue, but a model can't. In any event, the proposed solution was to build interpretable models in local areas of t...
Very nice write-up. I definitely appreciate this website.Keep writing!
ReplyDeleteMysql DBA Training Online From US|UK|CANADA|AUSTRALIA
Nice Post
ReplyDeleteRed Hat Training in Chennai