probably a dumb question, but why is reference equality used for the indexes?
On Sun, Jul 6, 2014 at 12:43 AM, Ankur Dave <ankurd...@gmail.com> wrote: > When joining two VertexRDDs with identical indexes, GraphX can use a fast > code path (a zip join without any hash lookups). However, the check for > identical indexes is performed using reference equality. > > Without caching, two copies of the index are created. Although the two > indexes are structurally identical, they fail reference equality, and so > GraphX mistakenly uses the slow path involving a hash lookup per joined > element. > > I'm working on a patch <https://github.com/apache/spark/pull/1297> that > attempts an optimistic zip join with per-element fallback to hash lookups, > which would improve this situation. > > Ankur <http://www.ankurdave.com/> > >