[ 
https://issues.apache.org/jira/browse/SPARK-31137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

S Daniel Zafar resolved SPARK-31137.
------------------------------------
    Resolution: Won't Do

moving to Databricks internal board.

> Opportunity to simplify execution plan when passing empty dataframes to 
> subtract()
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-31137
>                 URL: https://issues.apache.org/jira/browse/SPARK-31137
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 2.4.5
>            Reporter: S Daniel Zafar
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Execution plans are similar when passing an empty versus non-empty DataFrame 
> to pyspark's subtract call.
> {code:java}
> df.subtract(regDf){code}
> yields the same physical plan as:
> {code:java}
> df.subtract(emptyDf){code}
>  Since the operation (EXCEPT DISTINCT in Spark SQL) requires a sort on both 
> DataFrames, this can yield some significant performance speed-ups because if 
> the incoming DF is empty no processing should happen.
>  
> Should be a quick fix for a seasoned commiter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to