Hi Peter,
   Looks like mapjoin does not work with outer join so streamtable is
instead a possible approach. You would stream the larger table through the
smaller one:

 can you see whether the following helps your perf issue?

select /*+ streamtable(message) */ f.uuid  from message m right outer join
feed f on m.uuid = f.uuid where m.uuid = null;




2013/5/5 Peter Chu <pete....@outlook.com>

> Thanks, Stephen,
>
> I do not quite understand what you mean by Stream, specifically "Stream
> the 400M message records through the in-memory maps".
> Can you please elaborate.
>
> Also, can you use MAPJOIN on left outer join?
>
> Peter
>
> ------------------------------
> Date: Sun, 5 May 2013 21:44:37 -0700
>
> Subject: Re: Hive QL - NOT IN, NOT EXIST
> From: java...@gmail.com
> To: user@hive.apache.org
>
>
>
> @Peter  Does the query plan demonstrate that the 3Meg row table is being
> map-joined and the 400M table streamed through? That is what you want: but
> you might either need to fiddle with hints to get it to happen
>
> Details:
>     Read uuids s of feed into  in-memory map on all nodes (mapjoin)
>     Stream the 400M message records through the in-memory maps, copying
> id's from the "all feed uuids"  map to a  "matched feed uuid's map for
> entries that have matches in the messages
>
>      Note: this way the 400M rows are only read once on the cluster.
>
> You can see whether hive can manage this or if you write a custom m/r job
> to do it.
>
>
>
> 2013/5/5 Peter Chu <pete....@outlook.com>
>
> It works but it takes a very long time because the subqueries in NOT IN
> contains 400 million rows (the message table in the example) and the feed
> table contains 3 million rows.
>
> SELECT uuid from feed f WHERE f.uuid NOT IN (SELECT uuid FROM message);
>
> > Date: Sun, 5 May 2013 20:25:15 -0700
> > From: michaelma...@yahoo.com
> > Subject: Re: Hive QL - NOT IN, NOT EXIST
> > To: user@hive.apache.org
>
> >
> >
> > --- On Sun, 5/5/13, Peter Chu <pete....@outlook.com> wrote:
> >
> > > I am wondering if there is any way to do this without resorting to
> > > using left outer join and finding nulls.
> >
> > I have found this to be an acceptable substitute. Is it not working for
> you?
> >
>
>
>

Reply via email to