[GitHub] spark issue #20372: [SPARK-23249] [SQL] Improved block merging logic for par...

2018-02-14 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20372 It sounds like we fixed a "bug" and make the actual partition size more close to the expected one, but caused another "bug". 2 speculations: 1. The expected partition size can't maximum

[GitHub] spark issue #20372: [SPARK-23249] [SQL] Improved block merging logic for par...

2018-02-14 Thread glentakahashi
Github user glentakahashi commented on the issue: https://github.com/apache/spark/pull/20372 No worries. Can you shed some more light onto the performance regressions? Are the benchmark code/results public for me to peruse? If not, could you post a high level summary? I'd love to

[GitHub] spark issue #20372: [SPARK-23249] [SQL] Improved block merging logic for par...

2018-02-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20372 This PR was merged to RC3 of Spark 2.3. For all such fixes, we should not merge them to Spark 2.3. The performance regression has been witnessed in RC3, compared with RC2. We did not investigate

[GitHub] spark issue #20372: [SPARK-23249] [SQL] Improved block merging logic for par...

2018-02-14 Thread glentakahashi
Github user glentakahashi commented on the issue: https://github.com/apache/spark/pull/20372 @gatorsmile can you link the ticket about the perf regression? I imagine you would be seeing perf regressions in cases where partition counts are less than total cluster capacity, as this has

[GitHub] spark issue #20372: [SPARK-23249] [SQL] Improved block merging logic for par...

2018-02-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20372 Reverted from 2.3 and master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #20372: [SPARK-23249] [SQL] Improved block merging logic for par...

2018-02-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20372 We saw a performance regression in SPARK 2.3 about this change. Let me revert it now and please resubmit the PR with more reviews. ---

[GitHub] spark issue #20372: [SPARK-23249] [SQL] Improved block merging logic for par...

2018-01-31 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20372 thanks, merging to master/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #20372: [SPARK-23249] [SQL] Improved block merging logic for par...

2018-01-31 Thread glentakahashi
Github user glentakahashi commented on the issue: https://github.com/apache/spark/pull/20372 What are the remaining steps to get this merged? Just checking that I don't need to do anything else from my end. --- -