Thanks Ankit for the honest feedback. Would you be willing to update
our wiki and improve the instructions based on your experiences for
our gotchas?

We have a guide we have been working on ourselves to getting Nutch
running and churning on ElasticMap Reduce. That’s where I’d recommend


Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

-----Original Message-----
From: Ankit Goel <>
Reply-To: "" <>
Date: Wednesday, July 22, 2015 at 5:51 PM
To: "" <>
Subject: Nutch on the cloud

>After my runs on my lappy, I'm ready to port my work to the cloud.
>to use Amazon. One thing I noticed when I started with nutch that there
>were a lot of things unsaid on the site/wiki and took me a lot of time to
>figure out. Pitfalls if I may call them. I dont really have code or
>scripts, but I need nutch to run all the time on the cloud.
>So before I port to the cloud, are there any things I should beware of or
>lookout for? Like is AWS fine with nutch? Are there any configurations I
>should remember? Any advice on implementation to ease my transition and
>nutch 24hrs? i will be running a seed file and crawl the net in general.
>Ankit Goel

Reply via email to