Hi,
I have uploaded few csv files from windows into hive and configured few
external tables using them. When I am trying to run a join on two tables
one of the int columns
get changed to 0. The structure of the tables are as follows:
Table-1 Table-2
------------ -----------
Id(int) id(int) datetime
eid(int)
-- ---- ------------
-----
1 1 2011-02-01 3
2 1 2011-03-01 4
3 2 2011-04-01 5
4 2011-05-01 6
6 2011-06-01 7
The join query is - select a.* from Table-2 a join Table-1 b on (a.id=b.id);
The output is:
1 2011-02-01 0
1 2011-03-01 0
2 2011-04-01 0
I checked the logs and noticed the following warning : WARN
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct: Extra bytes
detected at the end of the row! Ignoring similar problems.Could this be
causing it ?
When I turn on hive.auto.convert.join=true , the error goes away as there
is no reduce phase.The output is:
1 2011-02-01 3
1 2011-03-01 4
2 2011-04-01 5
Could somebody please help me figure out why we get the wrong results when
running through the reducer.
--
Thanks