imback82 commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-657297080
Thanks @c21!
> Re POC - I feel overall approach looks good to me. But IMO I think we
should do the coalesce/divide in physical plan rule, but not logical plan rule.
imback82 commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-657170656
> (3).We are seeing in production, coalescing might hurt the parallelism, if
the number of buckets are too few. Another way to avoid shuffle and sort, is to
split/divide the ta
imback82 commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-646476903
retest this please
This is an automated message from the Apache Git Service.
To respond to the message, please
imback82 commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-645825392
retest this please
This is an automated message from the Apache Git Service.
To respond to the message, please
imback82 commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-643582357
retest this please
This is an automated message from the Apache Git Service.
To respond to the message, please
imback82 commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-643570195
retest this please
This is an automated message from the Apache Git Service.
To respond to the message, please
imback82 commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-643564022
retest this please
This is an automated message from the Apache Git Service.
To respond to the message, please
imback82 commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-643047878
Here are some numbers when I joined two tables (store_sales from TPC-DS -
100 SF) and did `count` on it. It's run on 8 executors (8 cores each) and
generates about 47GB of shuf
imback82 commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-642194428
> it will be good to see benchmark numbers of a typical bucket join that can
benefit from this patch.
I will try to get some numbers this week.