I’m running Nutch 2.2.1 on a Hadoop cluster. I’m running 5000 links from the 
DMOZ Open Directory Project. The reduce job stops exactly at 33% all the time 
and it throws this exception. From the nutch mailing list, it seems that my job 
is stumbling upon a repUrl value that’s null.
-- 
Manikandan Saravanan
Architect - Technology
TheSocialPeople

On 6 January 2014 at 7:14:41 pm, Devin Suiter RDX (dsui...@rdx.com) wrote:

Based on the Exception type, it looks like something in your job is looking for 
a valid value, and not finding it.

You will probably need to share the job code for people to help with this - to 
my eyes, this doesn't appear to be a Hadoop configuration issue, or any kind of 
problem with how the system is working.

Are you using Avro inputs and outputs? If your reduce is trying to parse an 
Avro record, it may be that the field type is not correct, or maybe there is a 
reference to an outside schema object that is not available...

If you provide more information about the context of the error (use case, 
program goal, code block, something like that) then it is easier to help you.



Devin Suiter
Jr. Data Solutions Software Engineer

100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Mon, Jan 6, 2014 at 8:08 AM, Manikandan Saravanan 
<manikan...@thesocialpeople.net> wrote:
I’m trying to run Nutch 2.2.1 on a Hadoop 1.2.1 cluster. The fetch phase runs 
fine. But in the next job, this error comes up

java.lang.NullPointerException
at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

I’m running three nodes namely nutch1,2,3. The first one’s in the masters file 
and all are listed in the slaves file. The /etc/hosts file lists all machines 
along with their IP addresses. Can someone help me?

-- 
Manikandan Saravanan
Architect - Technology
TheSocialPeople

Reply via email to