[jira] [Resolved] (SPARK-24489) No check for invalid input type of weight data in ml.PowerIterationClustering

holdenk (JIRA) Mon, 07 Jan 2019 09:18:19 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


holdenk resolved SPARK-24489.
-----------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Thank's for working on this, I've merged the fix into master :)

> No check for invalid input type of weight data in ml.PowerIterationClustering
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-24489
>                 URL: https://issues.apache.org/jira/browse/SPARK-24489
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.4.0
>            Reporter: shahid
>            Assignee: shahid
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> The test case will result the following failure. currently in ml.PIC, there 
> is no check for the data type of weight column. We should check for the valid 
> data type of the weight.
> {code:java}
>   test("invalid input types for weight") {
>     val invalidWeightData = spark.createDataFrame(Seq(
>       (0L, 1L, "a"),
>       (2L, 3L, "b")
>     )).toDF("src", "dst", "weight")
>     val pic = new PowerIterationClustering()
>       .setWeightCol("weight")
>     val result = pic.assignClusters(invalidWeightData)
>   }
> {code}
> {code:java}
> Job aborted due to stage failure: Task 0 in stage 8077.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 8077.0 (TID 882, localhost, executor 
> driver): scala.MatchError: [0,1,null] (of class 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
>       at 
> org.apache.spark.ml.clustering.PowerIterationClustering$$anonfun$3.apply(PowerIterationClustering.scala:178)
>       at 
> org.apache.spark.ml.clustering.PowerIterationClustering$$anonfun$3.apply(PowerIterationClustering.scala:178)
>       at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>       at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>       at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>       at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>       at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>       at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:107)
>       at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:105)
>       at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24489) No check for invalid input type of weight data in ml.PowerIterationClustering

Reply via email to