Hello Hadoopers.  I thought I'd share a couple of sqoop bugs we found
recently.

1)  If, for some reason, sqoop fails to move a file/directory to it's
-target-dir because the file is no longer available, it will issue a
WARNing and not an error.  This is very significant in batch operations.
 Effectively you might expect to sqoop data into hdfs:///data/dir/mydir/
with a timestamp of today.  Even tho sqoop returns a 0 exit code, the  job
may have failed.  It is a rare case, but we've experienced it on a fairly
low volume Production Cluster.

2) When you sqoop with -append and you are using a "where" clause, sqoop
will create a temp dir before moving it into it's final -target-dir.  Sqoop
tries to ensure it's creating a unique temp dir by using the current
timestamp to the ms, plus the -table name.  The problem with this process
is that -table is not compatible with a "where" clause.  So sqoop uses NULL
for the tablename.

If you could use -tablename with -where, this would not be a problem.
 Sadly, we find temp sqoop dirs like
hdfs:///user/$username/_sqoop${time}NULL  This would also not be a problem
with greater than ms timing.  If you run parallel sqoop jobs, this will
probably bite you at some point.

Hortonworks is creating detailed jira's to get these items corrected, just
thought I'd share in the mean time.

Hope this helps someone.
Chris

Reply via email to