You were right, though the logs link in the ResourceManager would disappear and I couldn't get to the logs when the container died -- going through a NodeManager that was executing the actual task allowed me to look at the logs, and it turns out the mysqlimport logs are printed there:
2014-09-15 17:56:34,025 INFO [main] org.apache.sqoop.mapreduce.MySQLExportMapper: Checkpointing current export. 2014-09-15 17:56:34,025 INFO [main] org.apache.sqoop.mapreduce.MySQLExportMapper: Waiting for mysqlimport to complete 2014-09-15 17:56:34,032 INFO [Thread-34] org.apache.sqoop.mapreduce.MySQLExportMapper: mysqlimport: Error: 1205, Lock wait timeout exceeded; try restarting transaction, when using table: <table> I set log4j via because /etc/hadoop/conf/log4j.properties On Mon, Sep 15, 2014 at 11:32 AM, pratik khadloya <[email protected]> wrote: > Btw, Christian, can you tell me how you fed the custom log4j config to > sqoop? > I am interested in debugging the MySQLManager class. > > Thanks, > ~Pratik > > On Mon, Sep 15, 2014 at 10:39 AM, pratik khadloya <[email protected]> > wrote: >> >> You're right they are import only arguments, i misread your original >> question. >> Am surprised that there are no logs in the JT. You should be able to see >> the logs for attempt attempt_1410271365435_0034_m_000000_0 >> and also able to see which machine ran that map job. You can click on the >> machine name and then on bottom left there is a "Local logs" link which you >> can click and finally see the local mapper logs for that task tracker. >> >> The general url to directly get to those logs is: >> http://<task-tracker-machine-name>:50060/tasktracker.jsp >> >> I suspect the loading command maybe failing due to some column mismatch or >> some delimiter problems. >> >> ~Pratik >> >> On Mon, Sep 15, 2014 at 10:18 AM, Christian Verkerk >> <[email protected]> wrote: >>> >>> Hi, >>> >>> The jobtracker logs are all empty. The --split-by and --boundary-query >>> are sqoop import only arguments AFAICT. The split sizes, as in the >>> size of the file that is loaded into MySQL, is about 32MB. >>> >>> The sqoop export job I posted _does_ get data into MySQL, it just >>> stops after awhile (due to load, presumably) and so running just one >>> query against MySQL will work just fine and will not reproduce the >>> error. >>> >>> The key is that I need some way to get more information on the exact >>> error mysqlimport hits. >>> >>> Kind regards, >>> >>> Christian >>> >>> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <[email protected]> >>> wrote: >>> > Is there any reason given for the termination in the jobtracker logs? >>> > Also, i see that you have not specified any --split-by and/or >>> > --boundary-query option. >>> > Does sqoop take time to determine the splits, if yes then specifying >>> > these >>> > settings might help. >>> > >>> > Also, check what the split sizes are, you may be running into a data >>> > skew >>> > depending on the splitting column used (generally the primary key of >>> > the >>> > table). >>> > The query is printed in the sqoop logs, try running the same directly >>> > on >>> > mysql and see how mysql responds. >>> > >>> > ~Pratik >>> > >>> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk >>> > <[email protected]> wrote: >>> >> >>> >> Hi, >>> >> >>> >> I'm trying to run a sqoop export for a large dataset (at least 1B >>> >> rows) with the following sqoop export call: >>> >> >>> >> sqoop export --direct \ >>> >> --connect <host> \ >>> >> --table <table> \ >>> >> --export-dir /user/hive/warehouse/<table> \ >>> >> --num-mappers 8 \ >>> >> --username <username> \ >>> >> --password <password> \ >>> >> --input-fields-terminated-by ',' \ >>> >> --verbose >>> >> >>> >> Behind the scenes, I've found that sqoop export does what you'd expect >>> >> it to: it farms out the work to a (num-mappers) number of different >>> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into >>> >> a temp file on each of the nodes and sends it along to mysqlimport >>> >> which generates a LOAD DATA LOCAL INFILE for the tempfile into the >>> >> MySQL table. >>> >> >>> >> The following error occurs depending on the level of parallelism used >>> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10 >>> >> definitely will: >>> >> >>> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14 >>> >> 17:34:27 INFO mapreduce.Job: Task Id : >>> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error: >>> >> java.io.IOException: mysqlimport terminated with error code 1 at >>> >> >>> >> >>> >> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313) >>> >> at >>> >> >>> >> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250) >>> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at >>> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at >>> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at >>> >> java.security.AccessController.doPrivileged(Native Method) at >>> >> javax.security.auth.Subject.doAs(Subject.java:415) at >>> >> >>> >> >>> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) >>> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) >>> >> >>> >> I understand there is some limit to the level of parallelism that can >>> >> be achieved in the job -- mysqld can get tied up processing too many >>> >> things at once etc. but I'd like to know how to turn the debugging on >>> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I >>> >> can actually see the mysqlimport error. >>> >> >>> >> Reading through the following code[0] (not sure if this is the >>> >> relevant version BTW), I see that a logger is set up that should be >>> >> giving a lot of information[1] about the mysqlimport calls, but I >>> >> don't seem to be getting any of this fun in my logs. >>> >> >>> >> [0] >>> >> >>> >> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java >>> >> [1] `LOG.debug("Starting mysqlimport with arguments:");` >>> >> >>> >> >>> >> Additional info: >>> >> >>> >> I have log4j.properties setup in the following basic way: >>> >> >>> >> log4j.rootLogger=${root.logger} >>> >> root.logger=INFO,console >>> >> >>> >> log4j.logger.org.apache.hadoop.mapred=TRACE >>> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE >>> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE >>> >> >>> >> log4j.appender.console=org.apache.log4j.ConsoleAppender >>> >> log4j.appender.console.target=System.err >>> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout >>> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} >>> >> %p %c{2}: %m%n >>> >> >>> >> What I have found is that the `max_allowed_packet` setting in MySQL >>> >> seems to affect this behaviour somewhat but I'd rather get more >>> >> information about the actual error than attempt to tweak a setting >>> >> "blind". >>> >> >>> >> Relevant versioning: >>> >> >>> >> Cloudera Hadoop Distribution (5.1.2) >>> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu >>> >> sqoop version: 1.4.4 >>> >> >>> >> Kind regards, >>> >> >>> >> Christian Verkerk >>> >> >>> >> -- >>> >> Christian Verkerk >>> >> Software Engineer, Tubular Labs >>> >> [email protected] >>> > >>> > >>> >>> >>> >>> -- >>> Christian Verkerk >>> Software Engineer, Tubular Labs >>> [email protected] >>> >>> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <[email protected]> >>> wrote: >>> > Is there any reason given for the termination in the jobtracker logs? >>> > Also, i see that you have not specified any --split-by and/or >>> > --boundary-query option. >>> > Does sqoop take time to determine the splits, if yes then specifying >>> > these >>> > settings might help. >>> > >>> > Also, check what the split sizes are, you may be running into a data >>> > skew >>> > depending on the splitting column used (generally the primary key of >>> > the >>> > table). >>> > The query is printed in the sqoop logs, try running the same directly >>> > on >>> > mysql and see how mysql responds. >>> > >>> > ~Pratik >>> > >>> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk >>> > <[email protected]> wrote: >>> >> >>> >> Hi, >>> >> >>> >> I'm trying to run a sqoop export for a large dataset (at least 1B >>> >> rows) with the following sqoop export call: >>> >> >>> >> sqoop export --direct \ >>> >> --connect <host> \ >>> >> --table <table> \ >>> >> --export-dir /user/hive/warehouse/<table> \ >>> >> --num-mappers 8 \ >>> >> --username <username> \ >>> >> --password <password> \ >>> >> --input-fields-terminated-by ',' \ >>> >> --verbose >>> >> >>> >> Behind the scenes, I've found that sqoop export does what you'd expect >>> >> it to: it farms out the work to a (num-mappers) number of different >>> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into >>> >> a temp file on each of the nodes and sends it along to mysqlimport >>> >> which generates a LOAD DATA LOCAL INFILE for the tempfile into the >>> >> MySQL table. >>> >> >>> >> The following error occurs depending on the level of parallelism used >>> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10 >>> >> definitely will: >>> >> >>> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14 >>> >> 17:34:27 INFO mapreduce.Job: Task Id : >>> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error: >>> >> java.io.IOException: mysqlimport terminated with error code 1 at >>> >> >>> >> >>> >> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313) >>> >> at >>> >> >>> >> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250) >>> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at >>> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at >>> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at >>> >> java.security.AccessController.doPrivileged(Native Method) at >>> >> javax.security.auth.Subject.doAs(Subject.java:415) at >>> >> >>> >> >>> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) >>> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) >>> >> >>> >> I understand there is some limit to the level of parallelism that can >>> >> be achieved in the job -- mysqld can get tied up processing too many >>> >> things at once etc. but I'd like to know how to turn the debugging on >>> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I >>> >> can actually see the mysqlimport error. >>> >> >>> >> Reading through the following code[0] (not sure if this is the >>> >> relevant version BTW), I see that a logger is set up that should be >>> >> giving a lot of information[1] about the mysqlimport calls, but I >>> >> don't seem to be getting any of this fun in my logs. >>> >> >>> >> [0] >>> >> >>> >> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java >>> >> [1] `LOG.debug("Starting mysqlimport with arguments:");` >>> >> >>> >> >>> >> Additional info: >>> >> >>> >> I have log4j.properties setup in the following basic way: >>> >> >>> >> log4j.rootLogger=${root.logger} >>> >> root.logger=INFO,console >>> >> >>> >> log4j.logger.org.apache.hadoop.mapred=TRACE >>> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE >>> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE >>> >> >>> >> log4j.appender.console=org.apache.log4j.ConsoleAppender >>> >> log4j.appender.console.target=System.err >>> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout >>> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} >>> >> %p %c{2}: %m%n >>> >> >>> >> What I have found is that the `max_allowed_packet` setting in MySQL >>> >> seems to affect this behaviour somewhat but I'd rather get more >>> >> information about the actual error than attempt to tweak a setting >>> >> "blind". >>> >> >>> >> Relevant versioning: >>> >> >>> >> Cloudera Hadoop Distribution (5.1.2) >>> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu >>> >> sqoop version: 1.4.4 >>> >> >>> >> Kind regards, >>> >> >>> >> Christian Verkerk >>> >> >>> >> -- >>> >> Christian Verkerk >>> >> Software Engineer, Tubular Labs >>> >> [email protected] >>> > >>> > >>> >>> >>> >>> -- >>> Christian Verkerk >>> Software Engineer, Tubular Labs >>> [email protected] >> >> >
