Hi Mathieu,

Using the new ml package to train a RandomForestClassificationModel, you
can get feature importance. Then you can convert the prediction result to
RDD and feed it into BinaryClassificationEvaluator for ROC curve. You can
refer the following code snippet:

val rf = new RandomForestClassifier()
val model = rf.fit(trainingData)

val predictions = model.transform(testData)

val scoreAndLabels =
  predictions.select(model.getRawPredictionCol, model.getLabelCol).rdd.map {
    case Row(rawPrediction: Vector, label: Double) => (rawPrediction(1),
label)
    case Row(rawPrediction: Double, label: Double) => (rawPrediction, label)
  }
val metrics = new BinaryClassificationMetrics(scoreAndLabels)
metrics.roc()


Thanks
Yanbo

2016-06-15 7:13 GMT-07:00 matd <matd...@gmail.com>:

> Hi ml folks !
>
> I'm using a Random Forest for a binary classification.
> I'm interested in getting both the ROC *curve* and the feature importance
> from the trained model.
>
> If I'm not missing something obvious, the ROC curve is only available in
> the
> old mllib world, via BinaryClassificationMetrics. In the new ml package,
> only the areaUnderROC and areaUnderPR are available through
> BinaryClassificationEvaluator.
>
> The feature importance is only available in ml package, through
> RandomForestClassificationModel.
>
> Any idea to get both ?
>
> Mathieu
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Get-both-feature-importance-and-ROC-curve-from-a-random-forest-classifier-tp27175.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to