Yes that would be fantastic. How about a wiki page on getting up
and running and overcoming problems with the most recent Nutch?

The Nutch wiki is here:

http://wiki.apache.org/nutch/

Please sign up for an account and tell me your username. Then I’ll
grant you permissions to edit the wiki.

Thank you Ankit!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Ankit Goel <ankitgoel2...@gmail.com>
Reply-To: "user@nutch.apache.org" <user@nutch.apache.org>
Date: Thursday, July 23, 2015 at 7:22 AM
To: "user@nutch.apache.org" <user@nutch.apache.org>
Subject: Re: Nutch on the cloud

>Hey,
>@Chris, I would love to help with the wiki (honored in fact), but my
>inputs
>are not with respect to the getting started process. More along the lines
>of frequent errors after that. For example, the redirect plugin doesnt
>work
>how u expect it to (not even with the latest one). Or sometimes the
>parsechecker will give results that a normal nutch run wont, even tho its
>the same regex filter, or where to check it. Or which solr you need to
>start with cause the 5.x has a diff file structure. Things like that on
>which you spend a long.
>
>If there is a wiki for such a page I will gladly step up to the plate. It
>isnt exactly faq either. I was thinking I could blog about it, but I think
>ur idea of a wiki would be better so that it can be updated by later
>authors as the problems are removed. Uh so should I create one on the
>nutch
>site? Also many of the problems are questioned multiple times  in the
>mailing grp, and google search just doesnt cut it. So maybe a repository
>of
>frequent problems? that sort?
>thanks for the heads up on the other guide. gave me a starting point.
>
>
>On Thu, Jul 23, 2015 at 6:24 AM, Mattmann, Chris A (3980) <
>chris.a.mattm...@jpl.nasa.gov> wrote:
>
>> Thanks Ankit for the honest feedback. Would you be willing to update
>> our wiki and improve the instructions based on your experiences for
>> our gotchas?
>>
>> We have a guide we have been working on ourselves to getting Nutch
>> running and churning on ElasticMap Reduce. That’s where I’d recommend
>> starting.
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Ankit Goel <ankitgoel2...@gmail.com>
>> Reply-To: "user@nutch.apache.org" <user@nutch.apache.org>
>> Date: Wednesday, July 22, 2015 at 5:51 PM
>> To: "user@nutch.apache.org" <user@nutch.apache.org>
>> Subject: Nutch on the cloud
>>
>> >Hi,
>> >After my runs on my lappy, I'm ready to port my work to the cloud.
>> >Planning
>> >to use Amazon. One thing I noticed when I started with nutch that there
>> >were a lot of things unsaid on the site/wiki and took me a lot of time
>>to
>> >figure out. Pitfalls if I may call them. I dont really have code or
>> >scripts, but I need nutch to run all the time on the cloud.
>> >
>> >So before I port to the cloud, are there any things I should beware of
>>or
>> >lookout for? Like is AWS fine with nutch? Are there any configurations
>>I
>> >should remember? Any advice on implementation to ease my transition and
>> >run
>> >nutch 24hrs? i will be running a seed file and crawl the net in
>>general.
>> >Thanks
>> >
>> >--
>> >Regards,
>> >Ankit Goel
>> >http://about.me/ankitgoel
>>
>>
>
>
>-- 
>Regards,
>Ankit Goel
>http://about.me/ankitgoel

Reply via email to