cloud-fan commented on pull request #32210:
URL: https://github.com/apache/spark/pull/32210#issuecomment-826239123
After more thinking, I'm wondering if this is the right direction to go.
Apparently falling back to SMJ wastes the partially-built hash map.
If one partition is a bit
cloud-fan commented on pull request #32210:
URL: https://github.com/apache/spark/pull/32210#issuecomment-826239123
After more thinking, I'm wondering if this is the right direction to go.
Apparently falling back to SMJ wastes the partially-built hash map.
If one partition is a bit
cloud-fan commented on pull request #32210:
URL: https://github.com/apache/spark/pull/32210#issuecomment-824536267
retest this please
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
cloud-fan commented on pull request #32210:
URL: https://github.com/apache/spark/pull/32210#issuecomment-823919561
> We enabled shuffled hash join by default with this feature. In our
environment, roughly 25% of sort merge join queries are now running with
shuffled hash join after
cloud-fan commented on pull request #32210:
URL: https://github.com/apache/spark/pull/32210#issuecomment-823357554
I'm a bit worried about this solution:
1. sorting the stream-side at runtime may lead to slow query plan because
the sort is not whole-stage-codegen-ed.
2. unlike SMJ,