Hello All,
I am trying to do incremental import on daily basis and after importing I
am finding huge data loss.
I have used this script for incremental import from RDBMS to HDFS
sqoop import -libjars
--driver com.sybase.jdbc3.jdbc.SybDriver \
--query "select * from
from EMP where \$CONDITIONS and SAL > 201401200 and SAL <= 201401204 \
--check-column Unique_value \
--incremental append \
--last-value 201401200 \
--split-by DEPT \
--fields-terminated-by ',' \
--target-dir ${TARGET_DIR}/${INC} \
--username ${SYBASE_USERNAME} \
--password ${SYBASE_PASSWORD} \
now I have imported newly inserted data into RDBMS to HDFS
but when I do
*select count(*) , unique_value from EMP group by unique_value (both in
RDBMS and in HIVE)*
I can find huge data loss.
1) in RDBMS
Count(*) Unique_value
1000 201401201
5000 201401202
10000 201401203
2) in HIVE
Count(*) Unique_value
189 201401201
421 201401202
50 201401203
If I do
select Unique value from emp ;
Result :
201401201
201401201
201401201
201401201
201401201
.
.
201401202
.
.
and so on...
*Pls help and suggest why is it so *
*Many thanks in advance*
*Yogesh kumar*