Hi Mathieu, Using the new ml package to train a RandomForestClassificationModel, you can get feature importance. Then you can convert the prediction result to RDD and feed it into BinaryClassificationEvaluator for ROC curve. You can refer the following code snippet:
val rf = new RandomForestClassifier() val model = rf.fit(trainingData) val predictions = model.transform(testData) val scoreAndLabels = predictions.select(model.getRawPredictionCol, model.getLabelCol).rdd.map { case Row(rawPrediction: Vector, label: Double) => (rawPrediction(1), label) case Row(rawPrediction: Double, label: Double) => (rawPrediction, label) } val metrics = new BinaryClassificationMetrics(scoreAndLabels) metrics.roc() Thanks Yanbo 2016-06-15 7:13 GMT-07:00 matd <matd...@gmail.com>: > Hi ml folks ! > > I'm using a Random Forest for a binary classification. > I'm interested in getting both the ROC *curve* and the feature importance > from the trained model. > > If I'm not missing something obvious, the ROC curve is only available in > the > old mllib world, via BinaryClassificationMetrics. In the new ml package, > only the areaUnderROC and areaUnderPR are available through > BinaryClassificationEvaluator. > > The feature importance is only available in ml package, through > RandomForestClassificationModel. > > Any idea to get both ? > > Mathieu > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Get-both-feature-importance-and-ROC-curve-from-a-random-forest-classifier-tp27175.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >