cloud-fan commented on pull request #30494:
URL: https://github.com/apache/spark/pull/30494#issuecomment-734741387
The final stage can be just `df.collect()` where there is no target table.
To be clear, we can't make any guarantee about the internal shuffles, as it's
not reliable (depends
cloud-fan commented on pull request #30494:
URL: https://github.com/apache/spark/pull/30494#issuecomment-734715842
> why ? User set spark.sql.shuffle.partitions
Then we can't do AQE at all as it changes the number of partitions...
`spark.sql.shuffle.partitions` is a config for
cloud-fan commented on pull request #30494:
URL: https://github.com/apache/spark/pull/30494#issuecomment-734692905
then it's not a user-specified partitioning and we don't need to respect it.
This is an automated message from
cloud-fan commented on pull request #30494:
URL: https://github.com/apache/spark/pull/30494#issuecomment-734669644
is `repartition` involved in your case?
This is an automated message from the Apache Git Service.
To respond t
cloud-fan commented on pull request #30494:
URL: https://github.com/apache/spark/pull/30494#issuecomment-733680528
It's unfortunate that we can't retain the shuffle origin info when we
optimize out the repartition shuffle. The approach here looks like a good
compromise.
-