Saving predictions on training data with unsupervised learning

Mars Hall Sat, 04 Mar 2017 11:57:06 -0800

Hi 🐸 folks,

When using unsupervised learning algorithms (like K-Means) we need to save the 
predicted labels (cluster IDs) for the training data back into the datastore. 
Ideally, we want to automatically save bulk predictions for the training data 
after the model is created, when the RDD/DataFrame of all that data is already 
resident in Spark memory. It seems complex & inefficient to develop a whole 
separate process that (re)selects all that training data and then iteratively 
POSTs to `/queries.json` to get every prediction…


Would adding a `bulk_save_predictions()` function to the persistent model's 
#save method might be the right place to save predictions back into the 
eventdata store?

How do you folks label the training data from an unsupervised algorithm?

Any suggestions for making bulk predictions that mesh with PredictionIO's 
workflow?

*Mars Hall
Customer Facing Architect
Salesforce App Cloud / Heroku
San Francisco, California

Saving predictions on training data with unsupervised learning

Reply via email to