I found this really interesting article ("Deciding what to observe next") from the field of machine learning. They address the problem of building a regression model using data from a "data stream." A data stream is incoming data. The example they use is daily measurements of weather at different locations. But monitoring paradata during data collection also may have this flavor.
They use statistical techniques that I've seen before -- the Lasso for model selection and the EM algorithm for dealing with "missing" data. The missing data in this case are variables that you choose not to observe at certain points.
The neat thing is that their method continues to explore data that are judged to be "not useful" (i.e. not included in the model) at certain points.
They use statistical techniques that I've seen before -- the Lasso for model selection and the EM algorithm for dealing with "missing" data. The missing data in this case are variables that you choose not to observe at certain points.
The neat thing is that their method continues to explore data that are judged to be "not useful" (i.e. not included in the model) at certain points.
Comments
Post a Comment