Re: graphx Joining two VertexPartitions with different indexes is slow.

Koert Kuipers Sun, 06 Jul 2014 13:34:15 -0700

probably a dumb question, but why is reference equality used for the
indexes?



On Sun, Jul 6, 2014 at 12:43 AM, Ankur Dave <ankurd...@gmail.com> wrote:

> When joining two VertexRDDs with identical indexes, GraphX can use a fast
> code path (a zip join without any hash lookups). However, the check for
> identical indexes is performed using reference equality.
>
> Without caching, two copies of the index are created. Although the two
> indexes are structurally identical, they fail reference equality, and so
> GraphX mistakenly uses the slow path involving a hash lookup per joined
> element.
>
> I'm working on a patch <https://github.com/apache/spark/pull/1297> that
> attempts an optimistic zip join with per-element fallback to hash lookups,
> which would improve this situation.
>
> Ankur <http://www.ankurdave.com/>
>
>

Re: graphx Joining two VertexPartitions with different indexes is slow.

Reply via email to