subject:"\[GitHub\] \[spark\] agrawaldevesh commented on pull request #29342\: \[SPARK\-32399\]\[SQL\] Full outer shuffled hash join"

[GitHub] [spark] agrawaldevesh commented on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-11 Thread GitBox

agrawaldevesh commented on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-672275450 Hi Cheng, I am wondering if you might have a perf test handy to test this new implementation vs your old approach ? After going through the code and following along,

[GitHub] [spark] agrawaldevesh commented on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-10 Thread GitBox

agrawaldevesh commented on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-671527296 > ah good point about one key multi value. How about we use a standard hash set and use `(keyIndex, value_index)` as the key? Yeah I think what I was suggesting would ha