Hi Jason - these are the top-level job logs but to really know what's going on, we need the actual reducer task logs. Markus
-----Original message----- > From:Jason S <jason.stu...@gmail.com> > Sent: Thursday 21st January 2016 20:35 > To: user@nutch.apache.org > Subject: Indexing Nutch 1.11 indexing Fails > > Hi, > > I am having a problem indexing segments in Nutch 1.11 on Hadoop. > > The cluster seems to be configured correctly and every part of the crawl > process is working flawlessly, however this is my first attempt at hadoop > 2, so perhaps my memory settings aren't perfect. I'm also not sure where > to look in the log files for more information. > > The same data can be indexed with Nutch in local mode, so I don't think it > is a problem with the Solr configuration, and I have had Nutch 1.0.9 with > Hadoop 1.2.1 on this same cluster and everything worked ok. > > Please let me know if I can send more information, I have spent several > days working on this with no success or clue why it is happening. > > Thanks in advance, > > Jason > > ### Command ### > > /root/hadoop-2.4.0/bin/hadoop jar > /root/src/apache-nutch-1.11/build/apache-nutch-1.11.job > org.apache.nutch.indexer.IndexingJob crawl/crawldb -linkdb crawl/linkdb > crawl/segments/20160121113335 > > ### Error ### > > 16/01/21 14:20:47 INFO mapreduce.Job: map 100% reduce 19% > 16/01/21 14:20:48 INFO mapreduce.Job: map 100% reduce 26% > 16/01/21 14:20:48 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000001_0, Status : FAILED > Error: INSTANCE > 16/01/21 14:20:48 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000002_0, Status : FAILED > Error: INSTANCE > 16/01/21 14:20:48 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000000_0, Status : FAILED > Error: INSTANCE > 16/01/21 14:20:49 INFO mapreduce.Job: map 100% reduce 0% > 16/01/21 14:20:54 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000004_0, Status : FAILED > Error: INSTANCE > 16/01/21 14:20:55 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000002_1, Status : FAILED > Error: INSTANCE > 16/01/21 14:20:56 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000001_1, Status : FAILED > Error: INSTANCE > 16/01/21 14:21:00 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000000_1, Status : FAILED > Error: INSTANCE > 16/01/21 14:21:01 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000004_1, Status : FAILED > Error: INSTANCE > 16/01/21 14:21:02 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000002_2, Status : FAILED > Error: INSTANCE > 16/01/21 14:21:07 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000003_0, Status : FAILED > Error: INSTANCE > 16/01/21 14:21:08 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000004_2, Status : FAILED > Error: INSTANCE > 16/01/21 14:21:08 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000001_2, Status : FAILED > Error: INSTANCE > 16/01/21 14:21:11 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000000_2, Status : FAILED > Error: INSTANCE > 16/01/21 14:21:15 INFO mapreduce.Job: Task Id : > attempt_1453403905213_0001_r_000003_1, Status : FAILED > Error: INSTANCE > 16/01/21 14:21:16 INFO mapreduce.Job: map 100% reduce 100% > 16/01/21 14:21:16 INFO mapreduce.Job: Job job_1453403905213_0001 failed > with state FAILED due to: Task failed task_1453403905213_0001_r_000004 > Job failed as tasks failed. failedMaps:0 failedReduces:1 > > 16/01/21 14:21:16 INFO mapreduce.Job: Counters: 39 > File System Counters > FILE: Number of bytes read=0 > FILE: Number of bytes written=5578886 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=2277523 > HDFS: Number of bytes written=0 > HDFS: Number of read operations=80 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=0 > Job Counters > Failed reduce tasks=15 > Killed reduce tasks=2 > Launched map tasks=20 > Launched reduce tasks=17 > Data-local map tasks=19 > Rack-local map tasks=1 > Total time spent by all maps in occupied slots (ms)=334664 > Total time spent by all reduces in occupied slots (ms)=548199 > Total time spent by all map tasks (ms)=167332 > Total time spent by all reduce tasks (ms)=182733 > Total vcore-seconds taken by all map tasks=167332 > Total vcore-seconds taken by all reduce tasks=182733 > Total megabyte-seconds taken by all map tasks=257021952 > Total megabyte-seconds taken by all reduce tasks=561355776 > Map-Reduce Framework > Map input records=18083 > Map output records=18083 > Map output bytes=3140643 > Map output materialized bytes=3178436 > Input split bytes=2812 > Combine input records=0 > Spilled Records=18083 > Failed Shuffles=0 > Merged Map outputs=0 > GC time elapsed (ms)=1182 > CPU time spent (ms)=56070 > Physical memory (bytes) snapshot=6087245824 > Virtual memory (bytes) snapshot=34655649792 > Total committed heap usage (bytes)=5412749312 > File Input Format Counters > Bytes Read=2274711 > 16/01/21 14:21:16 ERROR indexer.IndexingJob: Indexer: java.io.IOException: > Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836) > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145) > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:222) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:231) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) >