Hi Ralf,
Do mean here the Open Graph Protocol [0] markup?
If so, then if it is resent within then it is already parsed
out and stored within Parse [1] and can be accessed Parse.getData().
Please use the ParserChecker to double check this and if necessary post an
example here so that I can be corre
Can I have a link to this ?
Regards,
Sachin Shaju
sachi...@mstack.com
+919539887554
On Thu, Sep 29, 2016 at 11:13 PM, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:
> Yep also check out the work that Sujen Shah just merged (also on my team
> at JPL and
> USC) where you can pub
Thank you guys for your replies. I will look into the suggestions you gave.
But I have one more query. How can I trigger nutch from a queue system in a
distributed environment ? Can REST api be a real option in distributed mode
? Or whether I will have to go for a command line invocation for nutch
You are welcome.
> -Original Message-
> From: lewis john mcgibbney [mailto:lewi...@apache.org]
> Sent: Friday, 30 September 2016 2:22 AM
> To: user@nutch.apache.org
> Subject: Re: Arch 1.9.2 is available
>
> Cool... thanks for posting.
>
> On Wed, Sep 28, 2016 at 1:36 AM,
> wrote:
>
>
Yep also check out the work that Sujen Shah just merged (also on my team at JPL
and
USC) where you can publish events to an ActiveMQ queue from Nutch crawling. That
should allow all sorts of production dashboards and analytics.
++
Ch
Hi Sachin,
Just a suggestion here - you can use Apache Kafka to generate and catch
events which are mapped to incoming crawl requests, crawl status and much
more.
I have created a prototype for production queue [0] which runs on top of a
supercomputer (TACC Wrangler) and integrated it with Kafka.
Cool... thanks for posting.
On Wed, Sep 28, 2016 at 1:36 AM, wrote:
>
> user Digest 28 Sep 2016 08:36:56 - Issue 2648
>
> Topics (messages 32792 through 32792)
>
> Arch 1.9.2 is available
> 32792 by: Arkadi.Kosmynin.csiro.au
>
> Administrivia:
>
>
I was trying to give custom options in *bin/crawl* script and encountered
an issue. I gave a custom config in nutch to ignore external outlinks in my
crawl command like :-
*bin/crawl -i -D elastic.index=test -D db.ignore.external.links=true urls/
CrawlTest/ 3*
But this is not working. Then I set
Hi,
I was experimenting some crawl cycles with nutch and would like to setup
a distributed crawl environment. But I wonder how can I trigger nutch for
incoming crawl requests in a production system. I read about nutch REST
api. Is that the real option that I have ? Or can I run nutch as a
contin
Hi,
I have tested running of nutch in server mode by starting it using
bin/nutch startserver command*locally*. Now I wonder whether I can start
nutch in *server mode* on top of a hadoop cluster(in distributed
environment) and submit crawl requests to server using nutch REST api ?
Please help.
Reg
10 matches
Mail list logo