date:20160929

Re: Open Graph metadata?

2016-09-29 Thread lewis john mcgibbney

Hi Ralf, Do mean here the Open Graph Protocol [0] markup? If so, then if it is resent within then it is already parsed out and stored within Parse [1] and can be accessed Parse.getData(). Please use the ParserChecker to double check this and if necessary post an example here so that I can be corre

Re: Nutch in production

2016-09-29 Thread Sachin Shaju

Can I have a link to this ? Regards, Sachin Shaju sachi...@mstack.com +919539887554 On Thu, Sep 29, 2016 at 11:13 PM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Yep also check out the work that Sujen Shah just merged (also on my team > at JPL and > USC) where you can pub

Re: Nutch in production

2016-09-29 Thread Sachin Shaju

Thank you guys for your replies. I will look into the suggestions you gave. But I have one more query. How can I trigger nutch from a queue system in a distributed environment ? Can REST api be a real option in distributed mode ? Or whether I will have to go for a command line invocation for nutch

RE: Arch 1.9.2 is available

2016-09-29 Thread Arkadi.Kosmynin

You are welcome. > -Original Message- > From: lewis john mcgibbney [mailto:lewi...@apache.org] > Sent: Friday, 30 September 2016 2:22 AM > To: user@nutch.apache.org > Subject: Re: Arch 1.9.2 is available > > Cool... thanks for posting. > > On Wed, Sep 28, 2016 at 1:36 AM, > wrote: > >

Re: Nutch in production

2016-09-29 Thread Mattmann, Chris A (3980)

Yep also check out the work that Sujen Shah just merged (also on my team at JPL and USC) where you can publish events to an ActiveMQ queue from Nutch crawling. That should allow all sorts of production dashboards and analytics. ++ Ch

Re: Nutch in production

2016-09-29 Thread Karanjeet Singh

Hi Sachin, Just a suggestion here - you can use Apache Kafka to generate and catch events which are mapped to incoming crawl requests, crawl status and much more. I have created a prototype for production queue [0] which runs on top of a supercomputer (TACC Wrangler) and integrated it with Kafka.

Re: Arch 1.9.2 is available

2016-09-29 Thread lewis john mcgibbney

Cool... thanks for posting. On Wed, Sep 28, 2016 at 1:36 AM, wrote: > > user Digest 28 Sep 2016 08:36:56 - Issue 2648 > > Topics (messages 32792 through 32792) > > Arch 1.9.2 is available > 32792 by: Arkadi.Kosmynin.csiro.au > > Administrivia: > >

Custom options in nutch crawl script

2016-09-29 Thread Sachin Shaju

I was trying to give custom options in *bin/crawl* script and encountered an issue. I gave a custom config in nutch to ignore external outlinks in my crawl command like :- *bin/crawl -i -D elastic.index=test -D db.ignore.external.links=true urls/ CrawlTest/ 3* But this is not working. Then I set

Nutch in production

2016-09-29 Thread Sachin Shaju

Hi, I was experimenting some crawl cycles with nutch and would like to setup a distributed crawl environment. But I wonder how can I trigger nutch for incoming crawl requests in a production system. I read about nutch REST api. Is that the real option that I have ? Or can I run nutch as a contin

How to run nutch server on distributed environment

2016-09-29 Thread Sachin Shaju

Hi, I have tested running of nutch in server mode by starting it using bin/nutch startserver command*locally*. Now I wonder whether I can start nutch in *server mode* on top of a hadoop cluster(in distributed environment) and submit crawl requests to server using nutch REST api ? Please help. Reg

Re: Open Graph metadata?

Re: Nutch in production

Re: Nutch in production

RE: Arch 1.9.2 is available

Re: Nutch in production

Re: Nutch in production

Re: Arch 1.9.2 is available

Custom options in nutch crawl script

Nutch in production

How to run nutch server on distributed environment

10 matches

Site Navigation

Mail list logo

Footer information