What is the data type of the p1 column? I've used hive with partitions containing far above 2 billion rows without having any problems like this.
On Wed, May 29, 2013 at 2:41 PM, Gabi Kazav <gabi.ka...@pursway.com> wrote: > Hi, > > > > We are working on hive DB with our Hadoop cluster. > > We now facing an issue about joining a big partition with more than 2^31 > rows. > > When the partition has more than 2147483648 rows (even 2147483649) the > output of the join is a single row. > When the partition has less than 2147483648 rows (event 2147483647) the > output is correct. > > Our test case: > > create a table with 2147483649 rows in a partition with the value : "1" , > join this table to another table with a single row,single column with the > value "1" on the partition_key. > later delete 2 rows and run the same join. > 1st : only a single row is created > 2nd : 2147483647 rows > > the query we run for test the case is: > > > > create table output_rows_over as > > select a.s1 > > from max_sint_rows a join small_table b > > on (a.p1=b.p1); > > > > on more than 2^31 rows we got the following on reducer log: > > 2013-05-27 21:51:14,186 INFO > org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1 > > On less than 2^31 rows we got the following reducer log: > > 2013-05-27 23:43:14,681 INFO > org.apache.hadoop.hive.ql.exec.FileSinkOperator: > TABLE_ID_1_ROWCOUNT:2147483647 > > > > > > Anyone faced this issue? > > Does hive has workaround for that? > > I have huge partitions I need to work on and I cannot use hive for that.. > > > > Thanks, > > > > > > Gabi Kazav > > Infrastructure Team Leader, Pursway.com > > > >