Thank, Nick. This worked for me.
val evaluator = new BinaryClassificationEvaluator(). setLabelCol("label"). setRawPredictionCol("ModelProbability"). setMetricName("areaUnderROC") val auROC = evaluator.evaluate(testResults) On Mon, Nov 14, 2016 at 4:00 PM, Nick Pentreath <nick.pentre...@gmail.com> wrote: > Typically you pass in the result of a model transform to the evaluator. > > So: > val model = estimator.fit(data) > val auc = evaluator.evaluate(model.transform(testData) > > Check Scala API docs for some details: http://spark.apache. > org/docs/latest/api/scala/index.html#org.apache.spark.ml.evaluation. > BinaryClassificationEvaluator > > On Mon, 14 Nov 2016 at 20:02 Bhaarat Sharma <bhaara...@gmail.com> wrote: > > Can you please suggest how I can use BinaryClassificationEvaluator? I > tried: > > scala> import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator > import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator > > scala> val evaluator = new BinaryClassificationEvaluator() > evaluator: org.apache.spark.ml.evaluation.BinaryClassificationEvaluator = > binEval_0d57372b7579 > > Try 1: > > scala> evaluator.evaluate(testScoreAndLabel.rdd) > <console>:105: error: type mismatch; > found : org.apache.spark.rdd.RDD[(Double, Double)] > required: org.apache.spark.sql.Dataset[_] > evaluator.evaluate(testScoreAndLabel.rdd) > > Try 2: > > scala> evaluator.evaluate(testScoreAndLabel) > java.lang.IllegalArgumentException: Field "rawPrediction" does not exist. > at org.apache.spark.sql.types.StructType$$anonfun$apply$1. > apply(StructType.scala:228) > > Try 3: > > scala> evaluator.evaluate(testScoreAndLabel.select(" > Label","ModelProbability")) > org.apache.spark.sql.AnalysisException: cannot resolve '`Label`' given > input columns: [_1, _2]; > at org.apache.spark.sql.catalyst.analysis.package$ > AnalysisErrorAt.failAnalysis(package.scala:42) > > > On Mon, Nov 14, 2016 at 1:44 PM, Nick Pentreath <nick.pentre...@gmail.com> > wrote: > > DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the > doubles from the test score and label DF. > > But you may prefer to just use spark.ml evaluators, which work with > DataFrames. Try BinaryClassificationEvaluator. > > On Mon, 14 Nov 2016 at 19:30, Bhaarat Sharma <bhaara...@gmail.com> wrote: > > I am getting scala.MatchError in the code below. I'm not able to see why > this would be happening. I am using Spark 2.0.1 > > scala> testResults.columns > res538: Array[String] = Array(TopicVector, subject_id, hadm_id, isElective, > isNewborn, isUrgent, isEmergency, isMale, isFemale, oasis_score, > sapsii_score, sofa_score, age, hosp_death, test, ModelFeatures, Label, > rawPrediction, ModelProbability, ModelPrediction) > > scala> testResults.select("Label","ModelProbability").take(1) > res542: Array[org.apache.spark.sql.Row] = > Array([0.0,[0.737304818744076,0.262695181255924]]) > > scala> val testScoreAndLabel = testResults. > | select("Label","ModelProbability"). > | map { case Row(l:Double, p:Vector) => (p(1), l) } > testScoreAndLabel: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: > double, _2: double] > > scala> testScoreAndLabel > res539: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: double, _2: > double] > > scala> testScoreAndLabel.columns > res540: Array[String] = Array(_1, _2) > > scala> val testMetrics = new > BinaryClassificationMetrics(testScoreAndLabel.rdd) > testMetrics: org.apache.spark.mllib.evaluation.BinaryClassificationMetrics = > org.apache.spark.mllib.evaluation.BinaryClassificationMetrics@36e780d1 > > The code below gives the error > > val auROC = testMetrics.areaUnderROC() //this line gives the error > > Caused by: scala.MatchError: [0.0,[0.7316583497453766,0.2683416502546234]] > (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) > > >