I'm trying to create a subset of a large table for testing. The following approach works:
create table subset_table as select * from large_table limit 1000 ...but it only uses one reducer. I would like to speed up the process of creating a subset but distributing across multiple reducers. I already tried explicitly setting mapred.reduce.tasks and hive.exec.reducers.max to values larger than 1, but in this particular case, those values seem to be over-ridden by Hive's internal query->to->mapreduce conversion; it ignores those parameters. So, I tried this: create table subset_table as select * from large_table limit 1000 distribute by column_name ...but that doesn't parse. I get the following error: OK FAILED: ParseException line 3:0 missing EOF at 'distribute' near '1000'. I have tried NUMEROUS applications of parentheses, nested queries, etc. For example, here's just one (amongst perhaps ten variations on a theme): create table subset_table as select * from ( from ( select * from large_table limit 1000 distribute by column_name )) s Like I said, I've tried all sorts of combinations of the elements shown above. So far I have not even gotten any syntax to parse, much less run. Only the original query at the top will even pass the parsing stage of processing. Any ideas? Thanks. ________________________________________________________________________________ Keith Wiley kwi...@keithwiley.com keithwiley.com music.keithwiley.com "I do not feel obliged to believe that the same God who has endowed us with sense, reason, and intellect has intended us to forgo their use." -- Galileo Galilei ________________________________________________________________________________