Hi ALL, I’ve tried the GLM (General Linear Model) of Spark 2.0.0-preview. And I’ve countered some unexpected problems. • First problem: I test the “poisson” family type GLM with a very small dataset using SparkR 2.0.0 This dataset can run “poisson” family type GLM in general R successfully. But SparkR showed the error below. And I have no idea where this came from.
16/06/13 14:10:58 WARN WeightedLeastSquares: regParam is zero, which might cause numerical instability and overfitting. 16/06/13 14:10:58 ERROR Executor: Exception in task 0.0 in stage 28.0 (TID 28) java.lang.IllegalArgumentException: requirement failed: The response variable of Poisson family should be positive, but got 0.0 <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27145/P.png> • Second problem: When I run the same dataset which I ran successfully on Spark 1.6.0, Spark 2.0.0 generated the error below. ERROR RBackendHandler: fit on org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper failed Error in invokeJava(isStatic = TRUE, className, methodName, ...) : org.apache.spark.SparkException: Currently, GeneralizedLinearRegression only supports number of features <= 4096. Found 7664 in the input dataset. <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27145/P2.png> This is the R code: “model <- glm(flow~Origin + Destination, data = distance_flow,family = gaussian(link = "identity"))” Dose this because Spark 2.0.0 not support as large dataset as the previous version? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-0-GLM-problem-tp27145.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org