HeartSaVioR opened a new pull request, #37161: URL: https://github.com/apache/spark/pull/37161
### What changes were proposed in this pull request? This PR proposes to include the origin logical plan for LogicalRDD, if the LogicalRDD is built from DataFrame's RDD. Once the origin logical plan is available, LogicalRDD produces the stats from origin logical plan rather than default one. Also, this PR applies the change to ForeachBatchSink, which seems to be the only case as of now in current codebase. ### Why are the changes needed? The origin logical plan can be useful for several use cases, including: 1. wants to connect the two split logical plans into one (consider the case of foreachBatch sink: origin logical plan represents the plan for streaming query, and the logical plan for new Dataset represents the plan for batch query in user function) 2. inherits plan stats from origin logical plan ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New UT. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org