Thank you Peter. I just want to be sure. even if I use the "classification" setting the GBT uses regression trees and not classification trees?
I know the difference between the two(theoretically) is only in the loss and impurity functions. thus in case it uses classification trees doing what you proposed will result in the classification it self. Also by looking in the scala API I found that each Node holds a Predict object which contains "probability of the label (classification only)" ( https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.mllib.tree.model.Predict ) ** This what i called confidence So to sum-up my questions and confusion: 1. Does GBT uses classification trees when setting it to classification or it always uses regression trees ? 2. In case it uses classification trees , How could i efficiently get to the confidence = Node. Predict.prob ? Thanks again' Michael On Mon, Apr 13, 2015 at 10:13 AM, pprett [via Apache Spark User List] < ml-node+s1001560n22470...@n3.nabble.com> wrote: > Hi Mike, > > Gradient Boosted Trees (or gradient boosted regression trees) dont store > probabilities in each leaf node but rather model a continuous function > which is then transformed via a logistic sigmoid (ie. like in a Logistic > Regression model). > If you are just interested in a confidence, you can use this continuous > function directly: its just the (weighted) sum of the predictions of the > individual regression trees. Use the absolute value for confidence and the > sign to determine which class label. > Here is an example: > > def score(features: Vector): Double = { > val treePredictions = gbdt.trees.map(_.predict(features)) > blas.ddot(gbdt.numTrees, treePredictions, 1, gbdt.treeWeights, 1) > } > > If you are rather interested in probabilities, just pass the function > value to a logistic sigmoid. > > best, > Peter > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Gradient-Boosted-Trees-classification-confidence-tp22466p22470.html > To unsubscribe from MLlib : Gradient Boosted Trees classification > confidence, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=22466&code=bWljaGFlbGtyYXNAZ21haWwuY29tfDIyNDY2fDQxMDYzODQ0Mw==> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Gradient-Boosted-Trees-classification-confidence-tp22466p22476.html Sent from the Apache Spark User List mailing list archive at Nabble.com.