If you don't want to index the fetched/parsed data from Nutch into HBase you 
could write a custom Indexer that sends the data directly into Kafka, I've done 
something similar (for testing purposes but using RabbitMQ) but it was with the 
1.x branch. Although you'll need to write custom code to apply the "monitoring 
rules" if all your rules are as simple as "term X present in crawled content" 
then this is not so complex, but if this can scale I recommend you to use the 
Elasticsearch backend and the percolator feature [1] Or if you want to use 
Solr, checkout Luwak [2].

Regards,

[1] 
http://www.elastic.co/guide/en/elasticsearch/reference/1.3/search-percolate.html
[2] https://github.com/flaxsearch/luwak

----- Original Message -----
From: "Chris A Mattmann (3980)" <[email protected]>
To: [email protected]
Sent: Friday, April 10, 2015 9:31:17 PM
Subject: [MASSMAIL]Re: Nutch | Gora with Kafka

One thing you could consider doing is setting up a polling REST service
either around Gora reading Nutch WebPage data, and/or directly interfacing
with HBase.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Melih Sevsay <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Friday, April 10, 2015 at 1:07 AM
To: "[email protected]" <[email protected]>
Subject: Nutch | Gora with Kafka

>Hi,
>i would like to send fetched data to Kafka as a topic to create alert
>system on that content,
>For example; xyz.com web site content fetched and inserted into hbase,
>at that particular moment i would like to make that content send to kafka
>as topic and search by specific keyword. if the keyword found in content,
>i will send message to user.
>To send fetched data to kafka as topics, should i do this using nutch
>plugins or gora.
>or how could i do this.
>Do you have any idea?
>Thanks in advance...?
>

Reply via email to