Ryan Claussen created SPARK-16857:
-------------------------------------

             Summary: CrossValidator and KMeans throws IllegalArgumentException
                 Key: SPARK-16857
                 URL: https://issues.apache.org/jira/browse/SPARK-16857
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 1.6.1
         Environment: spark-jobserver docker image.  Spark 1.6.1 on ubuntu, 
Hadoop 2.4
            Reporter: Ryan Claussen


I am attempting to use CrossValidation to train KMeans model. When I attempt to 
fit the data spark throws an IllegalArgumentException as below since the KMeans 
algorithm outputs an Integer into the prediction column instead of a Double.   
Before I go too far:  is using CrossValidation with Kmeans supported?

Here's the exception:
{quote}
java.lang.IllegalArgumentException: requirement failed: Column prediction must 
be of type DoubleType but was actually IntegerType.
 at scala.Predef$.require(Predef.scala:233)
 at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42)
 at 
org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator.evaluate(MulticlassClassificationEvaluator.scala:74)
 at 
org.apache.spark.ml.tuning.CrossValidator$$anonfun$fit$1.apply(CrossValidator.scala:109)
 at 
org.apache.spark.ml.tuning.CrossValidator$$anonfun$fit$1.apply(CrossValidator.scala:99)
 at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
 at org.apache.spark.ml.tuning.CrossValidator.fit(CrossValidator.scala:99)
 at 
com.ibm.bpm.cloud.ci.cto.prediction.SparkModelJob$.generateKMeans(SparkModelJob.scala:202)
 at 
com.ibm.bpm.cloud.ci.cto.prediction.SparkModelJob$.runJob(SparkModelJob.scala:62)
 at 
com.ibm.bpm.cloud.ci.cto.prediction.SparkModelJob$.runJob(SparkModelJob.scala:39)
 at 
spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.scala:301)
 at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
 at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
{quote}

Here is the code I'm using to set up my cross validator.  As the stack trace 
above indicates it is failing at the fit step when 
{quote}
...
    val mpc = new KMeans().setK(2).setFeaturesCol("indexedFeatures")
    val labelConverter = new 
IndexToString().setInputCol("prediction").setOutputCol("predictedLabel").setLabels(labelIndexer.labels)
    val pipeline = new Pipeline().setStages(Array(labelIndexer, featureIndexer, 
mpc, labelConverter))

    val evaluator = new 
MulticlassClassificationEvaluator().setLabelCol("approvedIndex").setPredictionCol("prediction")

    val paramGrid = new ParamGridBuilder().addGrid(mpc.maxIter, Array(100, 200, 
500)).build()
    val cv = new 
CrossValidator().setEstimator(pipeline).setEvaluator(evaluator).setEstimatorParamMaps(paramGrid).setNumFolds(3)
    val cvModel = cv.fit(trainingData)
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to