Seeing an odd problem when (very rarely) sqoop (version: 1.4.7.3.1.0.0-78)
appears to bring down 1 more record that what is counted in the source. My
sqoop code looks like...
{
sqoop import \
-Dmapreduce.map.memory.mb=3144 -Dmapreduce.map.java.opts=-Xmx1048m \
-Dyarn.app.mapreduce.am.log.level=DEBUG \
-Dmapreduce.map.log.level=DEBUG \
-Dmapreduce.reduce.log.level=DEBUG \
-Dmapred.job.name="Ora import table $tablename" \
-Djava.security.egd=file:///dev/urandom \
-Djava.security.egd=file:///dev/urandom \
-Doraoop.timestamp.string=false \
-Dmapreduce.map.max.attempts=10 \
-Dmapreduce.task.timeout=1500000 \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect $DBCNXN --username $DBUSER --password $DBPASSWORD \
--as-parquetfile \
--target-dir $importdir \
-query "$sqoop_query" \
--split-by $splitby \
--where "1=1" \
--num-mappers 12 \
--class-name "QueryResult_$tablename" \
--delete-target-dir} || { echo -e "\nFailed to sqoop data from
source DB"; exit 255; }
The tables in question are overwrites not incremental (though I cannot
check the actual data between source and sqoop sink because the source
updates every day with more rows).
Anyone with more sqoop experience know what could be happening here? Any
further debugging tips for looking into this?
* Note that for the two tables this is happening to, for one table A the
splitby is a non-numerical type (varchar) and the other table B has a
composite PK (though my splitby column here is numeric). Both of these
issue would complicate sqoop operations under normal circumstances, so
makes things even harder to debug in this case, since could be causing
other problems (though not totally sure what such side effects
<https://community.cloudera.com/t5/Support-Questions/Sqoop-import-composite-primary-key-and-textual-primary-key/td-p/145994>
might be
<https://community.cloudera.com/t5/Support-Questions/Sqoop-imported-more-records-than-source/td-p/174724>
).
--
This electronic message is intended only for the named
recipient, and may
contain information that is confidential or
privileged. If you are not the
intended recipient, you are
hereby notified that any disclosure, copying,
distribution or
use of the contents of this message is strictly
prohibited. If
you have received this message in error or are not the
named
recipient, please notify us immediately by contacting the
sender at
the electronic mail address noted above, and delete
and destroy all copies
of this message. Thank you.