This is addressed in https://issues.apache.org/jira/browse/SPARK-4789. In the new pipeline API, we can simply output two columns, one for the best predicted class, and the other for probabilities or confidence scores for each class. -Xiangrui
On Tue, Jan 6, 2015 at 11:43 AM, Jianguo Li <flyingfromch...@gmail.com> wrote: > Hi, > > A while ago, somebody asked about getting a confidence value of a prediction > with MLlib's implementation of Naive Bayes's classification. > > I was wondering if there is any plan in the near future for the predict > function to return both a label and a confidence/probability? Or could the > private variables in the various machine learning models be exposed so we > could write our own functions which return both? > > Having a confidence/probability could be very useful in real application. > For one thing, you can choose to trust the predicted label only if it has a > high confidence level. Also, if you want to combine the results from > multiple classifiers, the confidence/probability could be used as some kind > of weight for combining. > > Thanks, > > Jianguo --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org