Ping!
On Mon, Mar 24, 2014 at 11:50 AM, Jason Rosenberg <[email protected]> wrote: > Hi Kathleen, Et al, > > Any thoughts on my parameters above? Would this work for --direct mode > likely? > > Thanks, > > Jason > > > On Thu, Mar 20, 2014 at 3:56 PM, Jason Rosenberg <[email protected]> wrote: > >> Hi Kathleen, >> >> The sqoop command looks like this (a bash command, that takes a few args >> naming the source/dest table names, update key, and dest columns). >> >> sqoop_table () { >> sqoop --export >> --connect jdbc:mysql://$MYSQL_HOST/$MYSQL_DATABASE \ >> --username $MYSQL_USER \ >> --outdir $LOCAL_TEMP \ >> --export-dir /user/hive/warehouse/$HIVE_DATABASE.db/$1 \ >> --table $2 \ >> --update-key $3 \ >> --update-mode allowinsert \ >> --columns $4 \ >> --input-fields-terminated-by "\t" \ >> --input-null-string '\\N' \ >> --input-null-non-string '\\N' \ >> --batch \ >> --num-mappers 8 >> } >> >> The lock contention exceptions seem to be retried, up to a point. If >> they happen too often, the hadoop job tracker eventually decides to kill >> the job. Even when the job succeeds, there are usually a few of these >> exceptions. I originally had much more mappers configured, but here, I've >> reduced it to 8, and this seems to help (although it's not a guarantee). >> Reducing the num-mappers also makes it less likely that the target mysql >> db machine will get overloaded (e.g. was getting maxed out on cpu usage >> with 64 mappers, etc.). >> >> The number of records being sqooped is on the order of 1-5M at a time. >> >> Sounds like I should try --direct mode (but not sure if the >> null/delimiters we're using will work with it?). >> >> Jason >> >> >> >> On Thu, Mar 20, 2014 at 12:46 PM, Kathleen Ting <[email protected]>wrote: >> >>> Hi Jason, >>> >>> Rather than using the JDBC interface for transferring data, the direct >>> mode delegates the job of transferring data to the native utilities >>> provided by the database vendor. In the case of MySQL, the mysqldump >>> and mysqlimport will be used for retrieving data from the database >>> server or moving data back. Using native utilities will greatly >>> improve performance, as they are optimized to provide the best >>> possible transfer speed while putting less burden on the database >>> server. That said, there are several limitations that come with this >>> faster import. In the case of MySQL, each node hosting a TaskTracker >>> service needs to have both mysqldump and mysqlimport utilities >>> installed. Another limitation of the direct mode is that not all >>> parameters are supported. As the native utilities usually produce text >>> output, binary formats like SequenceFile or Avro won't work. Also, >>> parameters that customize the escape characters, type mapping, column >>> and row delimiters, or the NULL substitution string might not be >>> supported in all cases. >>> >>> Can you share your entire Sqoop command and the contents of failed >>> task attempt attempt_201403180842_0202_m_000002_1? >>> >>> Thanks, >>> Kate >>> >>> On Thu, Mar 20, 2014 at 8:24 AM, Jason Rosenberg <[email protected]> >>> wrote: >>> > Thoughts anyone? >>> > >>> > Thanks, >>> > >>> > Jason >>> > >>> > >>> > On Tue, Mar 18, 2014 at 2:23 PM, Jason Rosenberg <[email protected]> >>> wrote: >>> >> >>> >> Hi, >>> >> >>> >> I'm wondering if there is expected performance increases with using >>> the >>> >> --direct flag for exporting from hive to mysql. If so, how much >>> speedup? >>> >> >>> >> Also, I've been getting lock contention errors during export, and I'm >>> >> wondering if these are less likely using --direct mode? E.g. I'm >>> getting >>> >> these sorts of exceptions on the sqoop console: >>> >> >>> >> 14/03/18 14:44:15 INFO mapred.JobClient: Task Id : >>> >> attempt_201403180842_0202_m_000002_1, Status : FAILED >>> >> java.io.IOException: Can't export data, please check failed map task >>> logs >>> >> at >>> >> >>> org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) >>> >> at >>> >> >>> org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) >>> >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) >>> >> at >>> >> >>> org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) >>> >> at >>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) >>> >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) >>> >> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >>> >> at java.security.AccessController.doPrivileged(Native Method) >>> >> at javax.security.auth.Subject.doAs(Subject.java:415) >>> >> at >>> >> >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>> >> at org.apache.hadoop.mapred.Child.main(Child.java:262) >>> >> Caused by: java.io.IOException: java.sql.BatchUpdateException: >>> Deadlock >>> >> found when trying to get lock; try restarting transaction >>> >> at >>> >> org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWr >>> >> >>> >> >>> >> Thanks, >>> >> >>> >> Jason >>> > >>> > >>> >> >> >
