Hi Jason, Rather than using the JDBC interface for transferring data, the direct mode delegates the job of transferring data to the native utilities provided by the database vendor. In the case of MySQL, the mysqldump and mysqlimport will be used for retrieving data from the database server or moving data back. Using native utilities will greatly improve performance, as they are optimized to provide the best possible transfer speed while putting less burden on the database server. That said, there are several limitations that come with this faster import. In the case of MySQL, each node hosting a TaskTracker service needs to have both mysqldump and mysqlimport utilities installed. Another limitation of the direct mode is that not all parameters are supported. As the native utilities usually produce text output, binary formats like SequenceFile or Avro won't work. Also, parameters that customize the escape characters, type mapping, column and row delimiters, or the NULL substitution string might not be supported in all cases.
Can you share your entire Sqoop command and the contents of failed task attempt attempt_201403180842_0202_m_000002_1? Thanks, Kate On Thu, Mar 20, 2014 at 8:24 AM, Jason Rosenberg <[email protected]> wrote: > Thoughts anyone? > > Thanks, > > Jason > > > On Tue, Mar 18, 2014 at 2:23 PM, Jason Rosenberg <[email protected]> wrote: >> >> Hi, >> >> I'm wondering if there is expected performance increases with using the >> --direct flag for exporting from hive to mysql. If so, how much speedup? >> >> Also, I've been getting lock contention errors during export, and I'm >> wondering if these are less likely using --direct mode? E.g. I'm getting >> these sorts of exceptions on the sqoop console: >> >> 14/03/18 14:44:15 INFO mapred.JobClient: Task Id : >> attempt_201403180842_0202_m_000002_1, Status : FAILED >> java.io.IOException: Can't export data, please check failed map task logs >> at >> org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) >> at >> org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) >> at >> org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> at org.apache.hadoop.mapred.Child.main(Child.java:262) >> Caused by: java.io.IOException: java.sql.BatchUpdateException: Deadlock >> found when trying to get lock; try restarting transaction >> at >> org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWr >> >> >> Thanks, >> >> Jason > >
