Thanks Ankit for the honest feedback. Would you be willing to update our wiki and improve the instructions based on your experiences for our gotchas?
We have a guide we have been working on ourselves to getting Nutch running and churning on ElasticMap Reduce. That’s where I’d recommend starting. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Ankit Goel <ankitgoel2...@gmail.com> Reply-To: "user@nutch.apache.org" <user@nutch.apache.org> Date: Wednesday, July 22, 2015 at 5:51 PM To: "user@nutch.apache.org" <user@nutch.apache.org> Subject: Nutch on the cloud >Hi, >After my runs on my lappy, I'm ready to port my work to the cloud. >Planning >to use Amazon. One thing I noticed when I started with nutch that there >were a lot of things unsaid on the site/wiki and took me a lot of time to >figure out. Pitfalls if I may call them. I dont really have code or >scripts, but I need nutch to run all the time on the cloud. > >So before I port to the cloud, are there any things I should beware of or >lookout for? Like is AWS fine with nutch? Are there any configurations I >should remember? Any advice on implementation to ease my transition and >run >nutch 24hrs? i will be running a seed file and crawl the net in general. >Thanks > >-- >Regards, >Ankit Goel >http://about.me/ankitgoel