[ https://issues.apache.org/jira/browse/SPARK-28742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909052#comment-16909052 ]
eugen yushin commented on SPARK-28742: -------------------------------------- Looks like the issue is diff in logic between LocalRelation (used for data frames) and LogicalRDD (used for DF created from RDD) ``` val df2 = Seq("1").toDF("c1") df.explain(true) df2.explain(true) ``` > StackOverflowError when using otherwise(col()) in a loop > -------------------------------------------------------- > > Key: SPARK-28742 > URL: https://issues.apache.org/jira/browse/SPARK-28742 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0, 2.4.3 > Reporter: Ivan Tsukanov > Priority: Major > > The following code > {code:java} > val rdd = sparkContext.makeRDD(Seq(Row("1"))) > val schema = StructType(Seq( > StructField("c1", StringType) > )) > val df = sparkSession.createDataFrame(rdd, schema) > val column = when(col("c1").isin("1"), "1").otherwise(col("c1")) > (1 to 9).foldLeft(df) { case (acc, _) => > val res = acc.withColumn("c1", column) > res.take(1) > res > } > {code} > falls with > {code:java} > java.lang.StackOverflowError > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:395) > ...{code} > Probably, the problem is spark generates unexplainable big Physical Plan - > {code:java} > val rdd = sparkContext.makeRDD(Seq(Row("1"))) > val schema = StructType(Seq( > StructField("c1", StringType) > )) > val df = sparkSession.createDataFrame(rdd, schema) > val column = when(col("c1").isin("1"), "1").otherwise(col("c1")) > val result = (1 to 9).foldLeft(df) { case (acc, _) => > acc.withColumn("c1", column) > } > result.explain() > {code} > it shows a plan 18936 symbols length > {code:java} > == Physical Plan == > *(1) Project [CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE > WHEN (CASE .... 18936 symbols > +- Scan ExistingRDD[c1#1] {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org