Indexing Nutch 1.11 indexing Fails

Jason S Thu, 21 Jan 2016 11:36:15 -0800

Hi,

I am having a problem indexing segments in Nutch 1.11 on Hadoop.


The cluster seems to be configured correctly and every part of the crawl
process is working flawlessly, however this is my first attempt at hadoop
2, so perhaps my memory settings aren't perfect.  I'm also not sure where
to look in the log files for more information.

The same data can be indexed with Nutch in local mode, so I don't think it
is a problem with the Solr configuration, and I have had Nutch 1.0.9 with
Hadoop 1.2.1 on this same cluster and everything worked ok.

Please let me know if I can send more information, I have spent several
days working on this with no success or clue why it is happening.

Thanks in advance,

Jason

### Command ###

/root/hadoop-2.4.0/bin/hadoop jar
/root/src/apache-nutch-1.11/build/apache-nutch-1.11.job
org.apache.nutch.indexer.IndexingJob crawl/crawldb -linkdb crawl/linkdb
crawl/segments/20160121113335

### Error ###

16/01/21 14:20:47 INFO mapreduce.Job:  map 100% reduce 19%
16/01/21 14:20:48 INFO mapreduce.Job:  map 100% reduce 26%
16/01/21 14:20:48 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000001_0, Status : FAILED
Error: INSTANCE
16/01/21 14:20:48 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000002_0, Status : FAILED
Error: INSTANCE
16/01/21 14:20:48 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000000_0, Status : FAILED
Error: INSTANCE
16/01/21 14:20:49 INFO mapreduce.Job:  map 100% reduce 0%
16/01/21 14:20:54 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000004_0, Status : FAILED
Error: INSTANCE
16/01/21 14:20:55 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000002_1, Status : FAILED
Error: INSTANCE
16/01/21 14:20:56 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000001_1, Status : FAILED
Error: INSTANCE
16/01/21 14:21:00 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000000_1, Status : FAILED
Error: INSTANCE
16/01/21 14:21:01 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000004_1, Status : FAILED
Error: INSTANCE
16/01/21 14:21:02 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000002_2, Status : FAILED
Error: INSTANCE
16/01/21 14:21:07 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000003_0, Status : FAILED
Error: INSTANCE
16/01/21 14:21:08 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000004_2, Status : FAILED
Error: INSTANCE
16/01/21 14:21:08 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000001_2, Status : FAILED
Error: INSTANCE
16/01/21 14:21:11 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000000_2, Status : FAILED
Error: INSTANCE
16/01/21 14:21:15 INFO mapreduce.Job: Task Id :
attempt_1453403905213_0001_r_000003_1, Status : FAILED
Error: INSTANCE
16/01/21 14:21:16 INFO mapreduce.Job:  map 100% reduce 100%
16/01/21 14:21:16 INFO mapreduce.Job: Job job_1453403905213_0001 failed
with state FAILED due to: Task failed task_1453403905213_0001_r_000004
Job failed as tasks failed. failedMaps:0 failedReduces:1

16/01/21 14:21:16 INFO mapreduce.Job: Counters: 39
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=5578886
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2277523
HDFS: Number of bytes written=0
HDFS: Number of read operations=80
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed reduce tasks=15
Killed reduce tasks=2
Launched map tasks=20
Launched reduce tasks=17
Data-local map tasks=19
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=334664
Total time spent by all reduces in occupied slots (ms)=548199
Total time spent by all map tasks (ms)=167332
Total time spent by all reduce tasks (ms)=182733
Total vcore-seconds taken by all map tasks=167332
Total vcore-seconds taken by all reduce tasks=182733
Total megabyte-seconds taken by all map tasks=257021952
Total megabyte-seconds taken by all reduce tasks=561355776
Map-Reduce Framework
Map input records=18083
Map output records=18083
Map output bytes=3140643
Map output materialized bytes=3178436
Input split bytes=2812
Combine input records=0
Spilled Records=18083
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1182
CPU time spent (ms)=56070
Physical memory (bytes) snapshot=6087245824
Virtual memory (bytes) snapshot=34655649792
Total committed heap usage (bytes)=5412749312
File Input Format Counters
Bytes Read=2274711
16/01/21 14:21:16 ERROR indexer.IndexingJob: Indexer: java.io.IOException:
Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:222)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:231)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Indexing Nutch 1.11 indexing Fails

Reply via email to