Hi Sebastin

I am referring to the INFO messages that are printed in console when nutch
1.14 is running in distributed mode. For example

Injecting seed URLs
/mnt/nutch/runtime/deploy/bin/nutch inject /user/hadoop/crawlDIR/crawldb
17/07/29 06:51:18 INFO crawl.Injector: Injector: starting at 2017-07-29
17/07/29 06:51:18 INFO crawl.Injector: Injector: crawlDb:
17/07/29 06:51:18 INFO crawl.Injector: Injector: urlDir: seed.txt
17/07/29 06:51:18 INFO crawl.Injector: Injector: Converting injected urls
to crawl db entries.
17/07/29 06:51:19 INFO client.RMProxy: Connecting to ResourceManager at
17/07/29 06:51:20 INFO input.FileInputFormat: Total input paths to process
: 0
17/07/29 06:51:20 INFO input.FileInputFormat: Total input paths to process
: 1
17/07/29 06:51:20 INFO mapreduce.Job: Running job: job_1500749038440_0003
17/07/29 06:51:28 INFO mapreduce.Job: Job job_1500749038440_0003 running in
uber mode : false
17/07/29 06:51:28 INFO mapreduce.Job:  map 0% reduce 0%
17/07/29 06:51:33 INFO mapreduce.Job:  map 100% reduce 0%
17/07/29 06:51:38 INFO mapreduce.Job:  map 100% reduce 4%
17/07/29 06:51:40 INFO mapreduce.Job:  map 100% reduce 6%
17/07/29 06:51:41 INFO mapreduce.Job:  map 100% reduce 49%
17/07/29 06:51:42 INFO mapreduce.Job:  map 100% reduce 66%
17/07/29 06:51:43 INFO mapreduce.Job:  map 100% reduce 87%
17/07/29 06:51:44 INFO mapreduce.Job:  map 100% reduce 100%

I am running nutch from a EMR cluster. I did check around the log
directories and I dont see the messages i see in the console anywhere else.

One more thing i noticed is when i issue the command

*ps -ef | grep nutch*

hadoop    21616  18344  2 06:59 pts/1    00:00:09
/usr/lib/jvm/java-1.8.0-openjdk.x86_64/bin/java -Xmx1000m -server
-XX:OnOutOfMemoryError=kill -9 %p *-Dhadoop.log.dir=/usr/lib/hadoop/logs*
*-Dhadoop.log.file=hadoop.log* -Dhadoop.home.dir=/usr/lib/hadoop
-Dhadoop.id.str= *-Dhadoop.root.logger=INFO,console*
-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
-Dhadoop.security.logger=INFO,NullAppender -Dsun.net.inetaddr.ttl=30
org.apache.nutch.fetcher.Fetcher -D mapreduce.map.java.opts=-Xmx2304m -D
mapreduce.map.memory.mb=2880 -D mapreduce.reduce.java.opts=-Xmx4608m -D
mapreduce.reduce.memory.mb=5760 -D mapreduce.job.reduces=12 -D
mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D
mapreduce.map.output.compress=true -D fetcher.timelimit.mins=180
/user/hadoop/crawlDIR/segments/20170729065841 -noParsing -threads 100

The logger mentioned in the running process is console. How do i change it
to the log file rotated by log4j ?

i tried modifying the conf/log4j.properties file to use DRFA instead
of cmdstdout logger. but that did not help either.

Any help would be appreciated.


On Mon, Jul 24, 2017 at 12:52 AM, Sebastian Nagel <
wastl.na...@googlemail.com> wrote:

> Hi Srini,
> in distributed mode the bulk of Nutch's log output is kept in the Hadoop
> task logs.
> The configuration whether, how long and where these logs are kept depends
> on the
> configuration of your Hadoop cluster.  You can easily find tutorials and
> examples
> how to configure this if you google for "hadoop task logs".
> Be careful the Nutch logs are usually huge.  The easiest way to get them
> for a jobs
> is to run the following command on the master node:
>   yarn logs -applicationId <app_id>
> Best,
> Sebastian
> On 07/21/2017 10:09 PM, Srinivasan Ramaswamy wrote:
> > Hi
> >
> > I am running nutch in distributed mode. I would like to see all nuch logs
> > written to files. I only see the console output. Can i see the same
> > information logged to some log files ?
> >
> > When i run nutch in local mode i do see the logs in runtime/local/logs
> > directory. But when i run nutch in distributed mode, i dont see it
> anywhere
> > except console.
> >
> > Can anyone help me with the settings that i need to change ?
> >
> > Thanks
> > Srini
> >

Reply via email to