Hi Peter,
Looks like mapjoin does not work with outer join so streamtable is
instead a possible approach. You would stream the larger table through the
smaller one:
can you see whether the following helps your perf issue?
select /*+ streamtable(message) */ f.uuid from message m right outer
-1 (non-binding)
My apologies, but HIVE-4505 is a regression that IMHO should be addressed.
thanks
Prasad
On Tue, Apr 30, 2013 at 5:18 PM, Ashutosh Chauhan hashut...@apache.orgwrote:
Hey all,
Based on feedback from folks, I have respun release candidate, RC1.
Please take a look. It
This is the idea which I have thought, But in our scenario we have less control
on writing avro data with delimited TABS and NEWLINES.(encoding tabs and
newlines with other characters).
Since avro data can be pumped on to the Warehouse system from many sources and
if we have to implement this
Thanks Stephen,
Will start a cluster today to see if it helps.
Peter
Date: Mon, 6 May 2013 00:05:45 -0700
Subject: Re: Hive QL - NOT IN, NOT EXIST
From: java...@gmail.com
To: user@hive.apache.org
Hi Peter, Looks like mapjoin does not work with outer join so streamtable is
instead a possible
Not quite sure but I think each group by will give another M/R job.
It will be done in a single M/R job no matter how many fields are in
the GROUP BY clause.
On Mon, May 6, 2013 at 2:07 PM, Peter Chu pete@outlook.com wrote:
In Hive, I cannot perform a SELECT GROUP BY on fields not in the
views in hive are similar to those in any rdbms schema
normally a view is created to have a well defined interface over
an inconsistently defined table so that modification in the table
definition does not alter the view definition
another use case would be suppose you have 100 columns in a