Thank, Nick.

This worked for me.

    val evaluator = new BinaryClassificationEvaluator().
                    setLabelCol("label").
                    setRawPredictionCol("ModelProbability").
                    setMetricName("areaUnderROC")

    val auROC = evaluator.evaluate(testResults)

On Mon, Nov 14, 2016 at 4:00 PM, Nick Pentreath <nick.pentre...@gmail.com>
wrote:

> Typically you pass in the result of a model transform to the evaluator.
>
> So:
> val model = estimator.fit(data)
> val auc = evaluator.evaluate(model.transform(testData)
>
> Check Scala API docs for some details: http://spark.apache.
> org/docs/latest/api/scala/index.html#org.apache.spark.ml.evaluation.
> BinaryClassificationEvaluator
>
> On Mon, 14 Nov 2016 at 20:02 Bhaarat Sharma <bhaara...@gmail.com> wrote:
>
> Can you please suggest how I can use BinaryClassificationEvaluator? I
> tried:
>
> scala> import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
> import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
>
> scala>  val evaluator = new BinaryClassificationEvaluator()
> evaluator: org.apache.spark.ml.evaluation.BinaryClassificationEvaluator =
> binEval_0d57372b7579
>
> Try 1:
>
> scala> evaluator.evaluate(testScoreAndLabel.rdd)
> <console>:105: error: type mismatch;
>  found   : org.apache.spark.rdd.RDD[(Double, Double)]
>  required: org.apache.spark.sql.Dataset[_]
>        evaluator.evaluate(testScoreAndLabel.rdd)
>
> Try 2:
>
> scala> evaluator.evaluate(testScoreAndLabel)
> java.lang.IllegalArgumentException: Field "rawPrediction" does not exist.
>   at org.apache.spark.sql.types.StructType$$anonfun$apply$1.
> apply(StructType.scala:228)
>
> Try 3:
>
> scala> evaluator.evaluate(testScoreAndLabel.select("
> Label","ModelProbability"))
> org.apache.spark.sql.AnalysisException: cannot resolve '`Label`' given
> input columns: [_1, _2];
>   at org.apache.spark.sql.catalyst.analysis.package$
> AnalysisErrorAt.failAnalysis(package.scala:42)
>
>
> On Mon, Nov 14, 2016 at 1:44 PM, Nick Pentreath <nick.pentre...@gmail.com>
> wrote:
>
> DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the
> doubles from the test score and label DF.
>
> But you may prefer to just use spark.ml evaluators, which work with
> DataFrames. Try BinaryClassificationEvaluator.
>
> On Mon, 14 Nov 2016 at 19:30, Bhaarat Sharma <bhaara...@gmail.com> wrote:
>
> I am getting scala.MatchError in the code below. I'm not able to see why
> this would be happening. I am using Spark 2.0.1
>
> scala> testResults.columns
> res538: Array[String] = Array(TopicVector, subject_id, hadm_id, isElective, 
> isNewborn, isUrgent, isEmergency, isMale, isFemale, oasis_score, 
> sapsii_score, sofa_score, age, hosp_death, test, ModelFeatures, Label, 
> rawPrediction, ModelProbability, ModelPrediction)
>
> scala> testResults.select("Label","ModelProbability").take(1)
> res542: Array[org.apache.spark.sql.Row] = 
> Array([0.0,[0.737304818744076,0.262695181255924]])
>
> scala> val testScoreAndLabel = testResults.
>      | select("Label","ModelProbability").
>      | map { case Row(l:Double, p:Vector) => (p(1), l) }
> testScoreAndLabel: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: 
> double, _2: double]
>
> scala> testScoreAndLabel
> res539: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: double, _2: 
> double]
>
> scala> testScoreAndLabel.columns
> res540: Array[String] = Array(_1, _2)
>
> scala> val testMetrics = new 
> BinaryClassificationMetrics(testScoreAndLabel.rdd)
> testMetrics: org.apache.spark.mllib.evaluation.BinaryClassificationMetrics = 
> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics@36e780d1
>
> The code below gives the error
>
> val auROC = testMetrics.areaUnderROC() //this line gives the error
>
> Caused by: scala.MatchError: [0.0,[0.7316583497453766,0.2683416502546234]] 
> (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
>
>
>

Reply via email to