Yes that would be fantastic. How about a wiki page on getting up and running and overcoming problems with the most recent Nutch?
The Nutch wiki is here: http://wiki.apache.org/nutch/ Please sign up for an account and tell me your username. Then I’ll grant you permissions to edit the wiki. Thank you Ankit! ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Ankit Goel <ankitgoel2...@gmail.com> Reply-To: "user@nutch.apache.org" <user@nutch.apache.org> Date: Thursday, July 23, 2015 at 7:22 AM To: "user@nutch.apache.org" <user@nutch.apache.org> Subject: Re: Nutch on the cloud >Hey, >@Chris, I would love to help with the wiki (honored in fact), but my >inputs >are not with respect to the getting started process. More along the lines >of frequent errors after that. For example, the redirect plugin doesnt >work >how u expect it to (not even with the latest one). Or sometimes the >parsechecker will give results that a normal nutch run wont, even tho its >the same regex filter, or where to check it. Or which solr you need to >start with cause the 5.x has a diff file structure. Things like that on >which you spend a long. > >If there is a wiki for such a page I will gladly step up to the plate. It >isnt exactly faq either. I was thinking I could blog about it, but I think >ur idea of a wiki would be better so that it can be updated by later >authors as the problems are removed. Uh so should I create one on the >nutch >site? Also many of the problems are questioned multiple times in the >mailing grp, and google search just doesnt cut it. So maybe a repository >of >frequent problems? that sort? >thanks for the heads up on the other guide. gave me a starting point. > > >On Thu, Jul 23, 2015 at 6:24 AM, Mattmann, Chris A (3980) < >chris.a.mattm...@jpl.nasa.gov> wrote: > >> Thanks Ankit for the honest feedback. Would you be willing to update >> our wiki and improve the instructions based on your experiences for >> our gotchas? >> >> We have a guide we have been working on ourselves to getting Nutch >> running and churning on ElasticMap Reduce. That’s where I’d recommend >> starting. >> >> Cheers, >> Chris >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: chris.a.mattm...@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Associate Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> -----Original Message----- >> From: Ankit Goel <ankitgoel2...@gmail.com> >> Reply-To: "user@nutch.apache.org" <user@nutch.apache.org> >> Date: Wednesday, July 22, 2015 at 5:51 PM >> To: "user@nutch.apache.org" <user@nutch.apache.org> >> Subject: Nutch on the cloud >> >> >Hi, >> >After my runs on my lappy, I'm ready to port my work to the cloud. >> >Planning >> >to use Amazon. One thing I noticed when I started with nutch that there >> >were a lot of things unsaid on the site/wiki and took me a lot of time >>to >> >figure out. Pitfalls if I may call them. I dont really have code or >> >scripts, but I need nutch to run all the time on the cloud. >> > >> >So before I port to the cloud, are there any things I should beware of >>or >> >lookout for? Like is AWS fine with nutch? Are there any configurations >>I >> >should remember? Any advice on implementation to ease my transition and >> >run >> >nutch 24hrs? i will be running a seed file and crawl the net in >>general. >> >Thanks >> > >> >-- >> >Regards, >> >Ankit Goel >> >http://about.me/ankitgoel >> >> > > >-- >Regards, >Ankit Goel >http://about.me/ankitgoel