subject:"\[GitHub\] \[spark\] cloud\-fan commented on pull request #30494\: \[SPARK\-33551\]\[SQL\] Do not use custom shuffle reader for repartition"

[GitHub] [spark] cloud-fan commented on pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition

2020-11-27 Thread GitBox

cloud-fan commented on pull request #30494: URL: https://github.com/apache/spark/pull/30494#issuecomment-734741387 The final stage can be just `df.collect()` where there is no target table. To be clear, we can't make any guarantee about the internal shuffles, as it's not reliable (depends

[GitHub] [spark] cloud-fan commented on pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition

2020-11-27 Thread GitBox

cloud-fan commented on pull request #30494: URL: https://github.com/apache/spark/pull/30494#issuecomment-734715842 > why ? User set spark.sql.shuffle.partitions Then we can't do AQE at all as it changes the number of partitions... `spark.sql.shuffle.partitions` is a config for

[GitHub] [spark] cloud-fan commented on pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition

2020-11-26 Thread GitBox

cloud-fan commented on pull request #30494: URL: https://github.com/apache/spark/pull/30494#issuecomment-734692905 then it's not a user-specified partitioning and we don't need to respect it. This is an automated message from

[GitHub] [spark] cloud-fan commented on pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition

2020-11-26 Thread GitBox

cloud-fan commented on pull request #30494: URL: https://github.com/apache/spark/pull/30494#issuecomment-734669644 is `repartition` involved in your case? This is an automated message from the Apache Git Service. To respond t

[GitHub] [spark] cloud-fan commented on pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition

2020-11-25 Thread GitBox

cloud-fan commented on pull request #30494: URL: https://github.com/apache/spark/pull/30494#issuecomment-733680528 It's unfortunate that we can't retain the shuffle origin info when we optimize out the repartition shuffle. The approach here looks like a good compromise. -

[GitHub] [spark] cloud-fan commented on pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition

[GitHub] [spark] cloud-fan commented on pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition

[GitHub] [spark] cloud-fan commented on pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition

[GitHub] [spark] cloud-fan commented on pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition

[GitHub] [spark] cloud-fan commented on pull request #30494: [SPARK-33551][SQL] Do not use custom shuffle reader for repartition

5 matches

Site Navigation

Mail list logo

Footer information