Use distribute to spread across reducers

Keith Wiley Wed, 02 Oct 2013 11:49:04 -0700

I'm trying to create a subset of a large table for testing.  The following 
approach works:


create table subset_table as
select * from large_table limit 1000

...but it only uses one reducer.  I would like to speed up the process of 
creating a subset but distributing across multiple reducers.  I already tried 
explicitly setting mapred.reduce.tasks and hive.exec.reducers.max to values 
larger than 1, but in this particular case, those values seem to be over-ridden 
by Hive's internal query->to->mapreduce conversion; it ignores those parameters.

So, I tried this:

create table subset_table as
select * from large_table limit 1000
distribute by column_name

...but that doesn't parse.  I get the following error:

OK FAILED: ParseException line 3:0 missing EOF at 'distribute' near '1000'.

I have tried NUMEROUS applications of parentheses, nested queries, etc.  For 
example, here's just one (amongst perhaps ten variations on a theme):

create table subset_table as
select * from (
from (
select * from large_table limit 1000
distribute by column_name
)) s

Like I said, I've tried all sorts of combinations of the elements shown above.  
So far I have not even gotten any syntax to parse, much less run.  Only the 
original query at the top will even pass the parsing stage of processing.

Any ideas?

Thanks.

________________________________________________________________________________
Keith Wiley     kwi...@keithwiley.com     keithwiley.com    music.keithwiley.com

"I do not feel obliged to believe that the same God who has endowed us with
sense, reason, and intellect has intended us to forgo their use."
                                           --  Galileo Galilei
________________________________________________________________________________

Use distribute to spread across reducers

Reply via email to