Hi Sachin, Just a suggestion here - you can use Apache Kafka to generate and catch events which are mapped to incoming crawl requests, crawl status and much more.
I have created a prototype for production queue [0] which runs on top of a supercomputer (TACC Wrangler) and integrated it with Kafka. Please have a look and let me know if you have any questions. [0]: https://github.com/karanjeets/PCF-Nutch-on-Wrangler P.S. - There can be many solutions to this. I am just giving one. :) Regards, Karanjeet Singh http://irds.usc.edu On Thu, Sep 29, 2016 at 1:33 AM, Sachin Shaju <sachi...@mstack.com> wrote: > Hi, > I was experimenting some crawl cycles with nutch and would like to setup > a distributed crawl environment. But I wonder how can I trigger nutch for > incoming crawl requests in a production system. I read about nutch REST > api. Is that the real option that I have ? Or can I run nutch as a > continuously running distributed server by any other option ? > > My preferred nutch version is nutch 1.12. > > Regards, > Sachin Shaju > > sachi...@mstack.com > +919539887554 > > -- > > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. > > WARNING: Computer viruses can be transmitted via email. The recipient > should check this email and any attachments for the presence of viruses. > The company accepts no liability for any damage caused by any virus > transmitted by this email. > > www.mStack.com > ᐧ