I’m running Nutch 2.2.1 on a Hadoop cluster. I’m running 5000 links from the DMOZ Open Directory Project. The reduce job stops exactly at 33% all the time and it throws this exception. From the nutch mailing list, it seems that my job is stumbling upon a repUrl value that’s null. -- Manikandan Saravanan Architect - Technology TheSocialPeople
On 6 January 2014 at 7:14:41 pm, Devin Suiter RDX (dsui...@rdx.com) wrote: Based on the Exception type, it looks like something in your job is looking for a valid value, and not finding it. You will probably need to share the job code for people to help with this - to my eyes, this doesn't appear to be a Hadoop configuration issue, or any kind of problem with how the system is working. Are you using Avro inputs and outputs? If your reduce is trying to parse an Avro record, it may be that the field type is not correct, or maybe there is a reference to an outside schema object that is not available... If you provide more information about the context of the error (use case, program goal, code block, something like that) then it is easier to help you. Devin Suiter Jr. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com On Mon, Jan 6, 2014 at 8:08 AM, Manikandan Saravanan <manikan...@thesocialpeople.net> wrote: I’m trying to run Nutch 2.2.1 on a Hadoop 1.2.1 cluster. The fetch phase runs fine. But in the next job, this error comes up java.lang.NullPointerException at org.apache.avro.util.Utf8.<init>(Utf8.java:37) at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) I’m running three nodes namely nutch1,2,3. The first one’s in the masters file and all are listed in the slaves file. The /etc/hosts file lists all machines along with their IP addresses. Can someone help me? -- Manikandan Saravanan Architect - Technology TheSocialPeople