[ https://issues.apache.org/jira/browse/SPARK-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yanbo Liang resolved SPARK-16064. --------------------------------- Resolution: Not A Problem > Fix the GLM error caused by NA produced by reweight function > ------------------------------------------------------------ > > Key: SPARK-16064 > URL: https://issues.apache.org/jira/browse/SPARK-16064 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.0.0 > Reporter: Zhang Mengqi > Assignee: Yanbo Liang > Priority: Minor > > This case happens when users run GLM in with SparkR, the same dataset runs > GLM well in native R. > When users run the GLM model using glm with family of poisson, it generates a > assertion errors by NA produced by reweight function. > 16/06/20 16:40:22 ERROR RBackendHandler: fit on > org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper failed > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : > java.lang.AssertionError: assertion failed: Sum of weights cannot be zero. > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.ml.optim.WeightedLeastSquares$Aggregator.validate(WeightedLeastSquares.scala:248) > at > org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:82) > at > org.apache.spark.ml.optim.IterativelyReweightedLeastSquares.fit(IterativelyReweightedLeastSquares.scala:85) > at > org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:276) > at > org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:134) > at org.apache.spark.ml.Predictor.fit(Predictor.scala:90) > at org.apache.spark.ml.Predictor.fit(Predictor.scala:71) > at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:148) > at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:144) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.Abstra > P.S The dataset is about a city ride flow between several planning area in > Singapore. > ride_flow_exp <- glm(flow~Origin+Destination+distance,ride_flow,family = > poisson(link = "log")) > SparkDataFrame[Origin:string, Destination:string, flow:double, Oi:int, > Dj:int, distance:double] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org