Hi Haopu, My understanding is that the hashtable on both left and right side is used for including null values in result in an efficient manner. If hash table is only built on one side, let's say left side and we perform a left outer join, for each row in left side, a scan over the right side is needed to make sure that no matching tuples for that row on left side.
Hope this helps! Liquan On Mon, Sep 29, 2014 at 8:36 PM, Haopu Wang <hw...@qilinsoft.com> wrote: > I take a look at HashOuterJoin and it's building a Hashtable for both > sides. > > This consumes quite a lot of memory when the partition is big. And it > doesn't reduce the iteration on streamed relation, right? > > Thanks! > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Liquan Pei Department of Physics University of Massachusetts Amherst