Hi,
Few months ago I started a crawl in a single machine (one process).
Now I'm trying to continue this crawl with Hadoop file system on the same 
machine using the tutorial "How to Setup Nutch (V1.1) and Hadoop".
When I run a crawl (TopN=25000, depth=7) with the new configuration, mergesegs 
fails.

The failed job details shows:
-----------------
java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:250)
        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
        at org.apache.hadoop.io.Text.readString(Text.java:400)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)

        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)

        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)

        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)

------------------

Any idea?

Regards
Patricio



Reply via email to