[GitHub] [spark] c21 edited a comment on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-15 Thread GitBox
c21 edited a comment on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-674457786 @agrawaldevesh - sorry for a separate irrelevant ping. It seems that `DecommissionWorkerSuite` (added in https://github.com/apache/spark/pull/29014) was kind of flaky where

[GitHub] [spark] c21 edited a comment on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-10 Thread GitBox
c21 edited a comment on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-671723199 ~~@cloud-fan - sorry if I miss anything, could you elaborate more of~~ ~~> We don't need to get the value index. We can calculate it by ourselves~~ ~~How do we

[GitHub] [spark] c21 edited a comment on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-10 Thread GitBox
c21 edited a comment on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-671542777 @cloud-fan, @agrawaldevesh and @viirya - if we go with , I think - 1.we probably still need one new abstract method for `HashedRelation`, which can be e.g.

[GitHub] [spark] c21 edited a comment on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-10 Thread GitBox
c21 edited a comment on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-671497515 @cloud-fan, @viirya - I just thought a bit more about it, and I think we need some tracking per row, but not per key. When doing full outer join, not only the join

[GitHub] [spark] c21 edited a comment on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
c21 edited a comment on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-670632581 @agrawaldevesh - thank you for warm welcome, and excited to discuss and collaborate again here! > I am curious if the approach of storing the 'matched rows' out of