Thank you guys for your replies. I will look into the suggestions you gave.
But I have one more query. How can I trigger nutch from a queue system in a
distributed environment ? Can REST api be a real option in distributed mode
? Or whether I will have to go for a command line invocation for nutch ?

Thanks and Regards,
Sachin Shaju

sachi...@mstack.com
+919539887554

On Thu, Sep 29, 2016 at 11:11 PM, Karanjeet Singh <karan...@usc.edu> wrote:

> Hi Sachin,
>
> Just a suggestion here - you can use Apache Kafka to generate and catch
> events which are mapped to incoming crawl requests, crawl status and much
> more.
>
> I have created a prototype for production queue [0] which runs on top of a
> supercomputer (TACC Wrangler) and integrated it with Kafka. Please have a
> look and let me know if you have any questions.
>
> [0]: https://github.com/karanjeets/PCF-Nutch-on-Wrangler
>
> P.S. - There can be many solutions to this. I am just giving one.  :)
>
> Regards,
> Karanjeet Singh
> http://irds.usc.edu
>
> On Thu, Sep 29, 2016 at 1:33 AM, Sachin Shaju <sachi...@mstack.com> wrote:
>
> > Hi,
> >    I was experimenting some crawl cycles with nutch and would like to
> setup
> > a distributed crawl environment. But I wonder how can I trigger nutch for
> > incoming crawl requests in a production system. I read about nutch REST
> > api. Is that the real option that I have ? Or can I run nutch as a
> > continuously running distributed server by any other option ?
> >
> >      My preferred nutch version is nutch 1.12.
> >
> > Regards,
> > Sachin Shaju
> >
> > sachi...@mstack.com
> > +919539887554
> >
> > --
> >
> >
> > The information contained in this electronic message and any attachments
> to
> > this message are intended for the exclusive use of the addressee(s) and
> may
> > contain proprietary, confidential or privileged information. If you are
> not
> > the intended recipient, you should not disseminate, distribute or copy
> this
> > e-mail. Please notify the sender immediately and destroy all copies of
> this
> > message and any attachments.
> >
> > WARNING: Computer viruses can be transmitted via email. The recipient
> > should check this email and any attachments for the presence of viruses.
> > The company accepts no liability for any damage caused by any virus
> > transmitted by this email.
> >
> > www.mStack.com
> >
>
> ᐧ
>

-- 
 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you should not disseminate, distribute or copy this 
e-mail. Please notify the sender immediately and destroy all copies of this 
message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient 
should check this email and any attachments for the presence of viruses. 
The company accepts no liability for any damage caused by any virus 
transmitted by this email.

www.mStack.com

Reply via email to