RE: Independent Map Reduce to parse Nutch content (Cont.)

2014-01-06 Thread Markus Jelsma
Hi - Check the logs first. -Original message- From: Bin Wang Sent: Saturday 4th January 2014 21:47 To: dev@nutch.apache.org Subject: Re: Independent Map Reduce to parse Nutch content (Cont.) Hi Tejas, I started an AWS instance and run hadoop in single node mode. When I do.. hadoop

Re: Independent Map Reduce to parse Nutch content (Cont.)

2014-01-04 Thread Tejas Patil
*>> It will finish all the mappers without problem but still.. errored out after all the mappers* *>> Exception in thread "main" java.io.IOException: Job failed!* As I mentioned in the earlier mail, did you see the logs to find out the root cause of the exception ? *>> I can see Nutch constan

Re: Independent Map Reduce to parse Nutch content (Cont.)

2014-01-04 Thread Bin Wang
Hi Tejas, I started an AWS instance and run hadoop in single node mode. When I do.. hadoop -jar example.jar hdfsinput/ hdfsoutput/ Everything works perfect as I expected: a bunch of staff got printed to the screen and both mappers and reducers got finished without question. In the end, the expec

Re: Independent Map Reduce to parse Nutch content (Cont.)

2014-01-03 Thread Tejas Patil
Hi Bin Wang, I would suggest you to NOT use eclipse and run your code over command line. Use logger statements and see the logs for full stack traces of the failure. In my personal experience, logs are the best way to debug hadoop code compared to Eclipse debugger. Thanks, Tejas On Fri, Jan 3, 2

Independent Map Reduce to parse Nutch content (Cont.)

2014-01-03 Thread Bin Wang
Hi, I tried to modify the code here to parse the nutch content data... http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java?view=markup And in the end of this email is a prototype that I have written to run map reduce to calculate the HTML content length of ea