[ https://issues.apache.org/jira/browse/SPARK-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
holdenk resolved SPARK-24489. ----------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Thank's for working on this, I've merged the fix into master :) > No check for invalid input type of weight data in ml.PowerIterationClustering > ----------------------------------------------------------------------------- > > Key: SPARK-24489 > URL: https://issues.apache.org/jira/browse/SPARK-24489 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.4.0 > Reporter: shahid > Assignee: shahid > Priority: Minor > Fix For: 3.0.0 > > > The test case will result the following failure. currently in ml.PIC, there > is no check for the data type of weight column. We should check for the valid > data type of the weight. > {code:java} > test("invalid input types for weight") { > val invalidWeightData = spark.createDataFrame(Seq( > (0L, 1L, "a"), > (2L, 3L, "b") > )).toDF("src", "dst", "weight") > val pic = new PowerIterationClustering() > .setWeightCol("weight") > val result = pic.assignClusters(invalidWeightData) > } > {code} > {code:java} > Job aborted due to stage failure: Task 0 in stage 8077.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 8077.0 (TID 882, localhost, executor > driver): scala.MatchError: [0,1,null] (of class > org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) > at > org.apache.spark.ml.clustering.PowerIterationClustering$$anonfun$3.apply(PowerIterationClustering.scala:178) > at > org.apache.spark.ml.clustering.PowerIterationClustering$$anonfun$3.apply(PowerIterationClustering.scala:178) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:107) > at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:105) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org