Hello Hadoopers. I thought I'd share a couple of sqoop bugs we found recently.
1) If, for some reason, sqoop fails to move a file/directory to it's -target-dir because the file is no longer available, it will issue a WARNing and not an error. This is very significant in batch operations. Effectively you might expect to sqoop data into hdfs:///data/dir/mydir/ with a timestamp of today. Even tho sqoop returns a 0 exit code, the job may have failed. It is a rare case, but we've experienced it on a fairly low volume Production Cluster. 2) When you sqoop with -append and you are using a "where" clause, sqoop will create a temp dir before moving it into it's final -target-dir. Sqoop tries to ensure it's creating a unique temp dir by using the current timestamp to the ms, plus the -table name. The problem with this process is that -table is not compatible with a "where" clause. So sqoop uses NULL for the tablename. If you could use -tablename with -where, this would not be a problem. Sadly, we find temp sqoop dirs like hdfs:///user/$username/_sqoop${time}NULL This would also not be a problem with greater than ms timing. If you run parallel sqoop jobs, this will probably bite you at some point. Hortonworks is creating detailed jira's to get these items corrected, just thought I'd share in the mean time. Hope this helps someone. Chris