c21 edited a comment on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-674457786
@agrawaldevesh - sorry for a separate irrelevant ping. It seems that
`DecommissionWorkerSuite` (added in https://github.com/apache/spark/pull/29014)
was kind of flaky where
c21 edited a comment on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-671723199
~~@cloud-fan - sorry if I miss anything, could you elaborate more of~~
~~> We don't need to get the value index. We can calculate it by ourselves~~
~~How do we
c21 edited a comment on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-671542777
@cloud-fan, @agrawaldevesh and @viirya - if we go with , I think -
1.we probably still need one new abstract method for `HashedRelation`, which
can be e.g.
c21 edited a comment on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-671497515
@cloud-fan, @viirya -
I just thought a bit more about it, and I think we need some tracking per
row, but not per key. When doing full outer join, not only the join
c21 edited a comment on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-670632581
@agrawaldevesh - thank you for warm welcome, and excited to discuss and
collaborate again here!
> I am curious if the approach of storing the 'matched rows' out of