RE: Indexing Nutch 1.11 indexing Fails

Markus Jelsma Thu, 21 Jan 2016 12:47:37 -0800

Hi Jason - these are the top-level job logs but to really know what's going on, 
we need the actual reducer task logs.
Markus


 
 
-----Original message-----
> From:Jason S <jason.stu...@gmail.com>
> Sent: Thursday 21st January 2016 20:35
> To: user@nutch.apache.org
> Subject: Indexing Nutch 1.11 indexing Fails
> 
> Hi,
> 
> I am having a problem indexing segments in Nutch 1.11 on Hadoop.
> 
> The cluster seems to be configured correctly and every part of the crawl
> process is working flawlessly, however this is my first attempt at hadoop
> 2, so perhaps my memory settings aren't perfect.  I'm also not sure where
> to look in the log files for more information.
> 
> The same data can be indexed with Nutch in local mode, so I don't think it
> is a problem with the Solr configuration, and I have had Nutch 1.0.9 with
> Hadoop 1.2.1 on this same cluster and everything worked ok.
> 
> Please let me know if I can send more information, I have spent several
> days working on this with no success or clue why it is happening.
> 
> Thanks in advance,
> 
> Jason
> 
> ### Command ###
> 
> /root/hadoop-2.4.0/bin/hadoop jar
> /root/src/apache-nutch-1.11/build/apache-nutch-1.11.job
> org.apache.nutch.indexer.IndexingJob crawl/crawldb -linkdb crawl/linkdb
> crawl/segments/20160121113335
> 
> ### Error ###
> 
> 16/01/21 14:20:47 INFO mapreduce.Job:  map 100% reduce 19%
> 16/01/21 14:20:48 INFO mapreduce.Job:  map 100% reduce 26%
> 16/01/21 14:20:48 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000001_0, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:20:48 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000002_0, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:20:48 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000000_0, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:20:49 INFO mapreduce.Job:  map 100% reduce 0%
> 16/01/21 14:20:54 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000004_0, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:20:55 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000002_1, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:20:56 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000001_1, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:21:00 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000000_1, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:21:01 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000004_1, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:21:02 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000002_2, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:21:07 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000003_0, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:21:08 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000004_2, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:21:08 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000001_2, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:21:11 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000000_2, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:21:15 INFO mapreduce.Job: Task Id :
> attempt_1453403905213_0001_r_000003_1, Status : FAILED
> Error: INSTANCE
> 16/01/21 14:21:16 INFO mapreduce.Job:  map 100% reduce 100%
> 16/01/21 14:21:16 INFO mapreduce.Job: Job job_1453403905213_0001 failed
> with state FAILED due to: Task failed task_1453403905213_0001_r_000004
> Job failed as tasks failed. failedMaps:0 failedReduces:1
> 
> 16/01/21 14:21:16 INFO mapreduce.Job: Counters: 39
> File System Counters
> FILE: Number of bytes read=0
> FILE: Number of bytes written=5578886
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=2277523
> HDFS: Number of bytes written=0
> HDFS: Number of read operations=80
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=0
> Job Counters
> Failed reduce tasks=15
> Killed reduce tasks=2
> Launched map tasks=20
> Launched reduce tasks=17
> Data-local map tasks=19
> Rack-local map tasks=1
> Total time spent by all maps in occupied slots (ms)=334664
> Total time spent by all reduces in occupied slots (ms)=548199
> Total time spent by all map tasks (ms)=167332
> Total time spent by all reduce tasks (ms)=182733
> Total vcore-seconds taken by all map tasks=167332
> Total vcore-seconds taken by all reduce tasks=182733
> Total megabyte-seconds taken by all map tasks=257021952
> Total megabyte-seconds taken by all reduce tasks=561355776
> Map-Reduce Framework
> Map input records=18083
> Map output records=18083
> Map output bytes=3140643
> Map output materialized bytes=3178436
> Input split bytes=2812
> Combine input records=0
> Spilled Records=18083
> Failed Shuffles=0
> Merged Map outputs=0
> GC time elapsed (ms)=1182
> CPU time spent (ms)=56070
> Physical memory (bytes) snapshot=6087245824
> Virtual memory (bytes) snapshot=34655649792
> Total committed heap usage (bytes)=5412749312
> File Input Format Counters
> Bytes Read=2274711
> 16/01/21 14:21:16 ERROR indexer.IndexingJob: Indexer: java.io.IOException:
> Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:222)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:231)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>

RE: Indexing Nutch 1.11 indexing Fails

Reply via email to