Re: direct mode for mysql export vs. default jdbc

Jason Rosenberg Mon, 24 Mar 2014 08:51:12 -0700

Hi Kathleen, Et al,

Any thoughts on my parameters above?  Would this work for --direct mode
likely?


Thanks,

Jason


On Thu, Mar 20, 2014 at 3:56 PM, Jason Rosenberg <[email protected]> wrote:

> Hi Kathleen,
>
> The sqoop command looks like this (a bash command, that takes a few args
> naming the source/dest table names, update key, and dest columns).
>
> sqoop_table () {
>   sqoop --export
>     --connect jdbc:mysql://$MYSQL_HOST/$MYSQL_DATABASE \
>     --username $MYSQL_USER \
>     --outdir $LOCAL_TEMP \
>     --export-dir /user/hive/warehouse/$HIVE_DATABASE.db/$1 \
>     --table $2 \
>     --update-key $3 \
>     --update-mode allowinsert \
>     --columns $4 \
>     --input-fields-terminated-by "\t" \
>     --input-null-string '\\N' \
>     --input-null-non-string '\\N' \
>     --batch \
>     --num-mappers 8
> }
>
> The lock contention exceptions seem to be retried, up to a point.  If they
> happen too often, the hadoop job tracker eventually decides to kill the
> job.  Even when the job succeeds, there are usually a few of these
> exceptions.  I originally had much more mappers configured, but here, I've
> reduced it to 8, and this seems to help (although it's not a guarantee).
>  Reducing the num-mappers also makes it less likely that the target mysql
> db machine will get overloaded (e.g. was getting maxed out on cpu usage
> with 64 mappers, etc.).
>
> The number of records being sqooped is on the order of 1-5M at a time.
>
> Sounds like I should try --direct mode (but not sure if the
> null/delimiters we're using will work with it?).
>
> Jason
>
>
>
> On Thu, Mar 20, 2014 at 12:46 PM, Kathleen Ting <[email protected]>wrote:
>
>> Hi Jason,
>>
>> Rather than using the JDBC interface for transferring data, the direct
>> mode delegates the job of transferring data to the native utilities
>> provided by the database vendor. In the case of MySQL, the mysqldump
>> and mysqlimport will be used for retrieving data from the database
>> server or moving data back. Using native utilities will greatly
>> improve performance, as they are optimized to provide the best
>> possible transfer speed while putting less burden on the database
>> server. That said, there are several limitations that come with this
>> faster import. In the case of MySQL, each node hosting a TaskTracker
>> service needs to have both mysqldump and mysqlimport utilities
>> installed. Another limitation of the direct mode is that not all
>> parameters are supported. As the native utilities usually produce text
>> output, binary formats like SequenceFile or Avro won't work. Also,
>> parameters that customize the escape characters, type mapping, column
>> and row delimiters, or the NULL substitution string might not be
>> supported in all cases.
>>
>> Can you share your entire Sqoop command and the contents of failed
>> task attempt attempt_201403180842_0202_m_000002_1?
>>
>> Thanks,
>> Kate
>>
>> On Thu, Mar 20, 2014 at 8:24 AM, Jason Rosenberg <[email protected]>
>> wrote:
>> > Thoughts anyone?
>> >
>> > Thanks,
>> >
>> > Jason
>> >
>> >
>> > On Tue, Mar 18, 2014 at 2:23 PM, Jason Rosenberg <[email protected]>
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I'm wondering if there is expected performance increases with using the
>> >> --direct flag for exporting from hive to mysql.  If so, how much
>> speedup?
>> >>
>> >> Also, I've been getting lock contention errors during export, and I'm
>> >> wondering if these are less likely using --direct mode?  E.g. I'm
>> getting
>> >> these sorts of exceptions on the sqoop console:
>> >>
>> >> 14/03/18 14:44:15 INFO mapred.JobClient: Task Id :
>> >> attempt_201403180842_0202_m_000002_1, Status : FAILED
>> >> java.io.IOException: Can't export data, please check failed map task
>> logs
>> >>         at
>> >>
>> org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
>> >>         at
>> >>
>> org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
>> >>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
>> >>         at
>> >>
>> org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
>> >>         at
>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
>> >>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>> >>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> >>         at java.security.AccessController.doPrivileged(Native Method)
>> >>         at javax.security.auth.Subject.doAs(Subject.java:415)
>> >>         at
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> >>         at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> >> Caused by: java.io.IOException: java.sql.BatchUpdateException: Deadlock
>> >> found when trying to get lock; try restarting transaction
>> >>         at
>> >> org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWr
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> Jason
>> >
>> >
>>
>
>

Re: direct mode for mysql export vs. default jdbc

Reply via email to