RE: Distinct in hive

2011-01-26 Thread Guy Doulberg
Thanks That was it From: Namit Jain [mailto:nj...@fb.com] Sent: Tuesday, January 25, 2011 7:04 PM To: user@hive.apache.org Subject: Re: Distinct in hive Is there skew in data ? You may want to set the parameter: hive.groupby.skewindata: to true. Thanks, -namit From: Guy Doulberg

Re: Distinct in hive

2011-01-25 Thread Namit Jain
user@hive.apache.org>" mailto:user@hive.apache.org>> Subject: Distinct in hive Hey, We made a query in hive, that calculates the number of distinct values in a group by. On small portion of data it worked well, however when we ran the query over large portion of data, we failed

Distinct in hive

2011-01-25 Thread Guy Doulberg
Hey, We made a query in hive, that calculates the number of distinct values in a group by. On small portion of data it worked well, however when we ran the query over large portion of data, we failed because OutOfMemory in some of the reducers. We wonder how is the distinct operator works in HI