What is the data type of the p1 column?  I've used hive with
partitions containing far above 2 billion rows without having any
problems like this.

On Wed, May 29, 2013 at 2:41 PM, Gabi Kazav <gabi.ka...@pursway.com> wrote:
> Hi,
>
>
>
> We are working on hive DB with our Hadoop cluster.
>
> We now facing an issue about joining a big partition with more than 2^31
> rows.
>
> When the partition has more than 2147483648 rows (even 2147483649) the
> output of the join is a single row.
> When the partition has less than 2147483648 rows (event 2147483647) the
> output is correct.
>
> Our test case:
>
> create a table with 2147483649 rows in a partition with the value : "1" ,
> join this table to another table with a single row,single column with the
> value "1" on the partition_key.
> later delete 2 rows and run the same join.
> 1st : only a single row is created
> 2nd : 2147483647 rows
>
> the query we run for test the case is:
>
>
>
> create table output_rows_over as
>
> select a.s1
>
> from  max_sint_rows a join small_table b
>
> on (a.p1=b.p1);
>
>
>
> on more than 2^31 rows we got the following on reducer log:
>
> 2013-05-27 21:51:14,186 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1
>
> On less than 2^31 rows we got the following reducer log:
>
> 2013-05-27 23:43:14,681 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator:
> TABLE_ID_1_ROWCOUNT:2147483647
>
>
>
>
>
> Anyone faced this issue?
>
> Does hive has workaround for that?
>
> I have huge partitions I need to work on and I cannot use hive for that..
>
>
>
> Thanks,
>
>
>
>
>
> Gabi Kazav
>
> Infrastructure Team Leader, Pursway.com
>
>
>
>

Reply via email to