If you don't want to index the fetched/parsed data from Nutch into HBase you could write a custom Indexer that sends the data directly into Kafka, I've done something similar (for testing purposes but using RabbitMQ) but it was with the 1.x branch. Although you'll need to write custom code to apply the "monitoring rules" if all your rules are as simple as "term X present in crawled content" then this is not so complex, but if this can scale I recommend you to use the Elasticsearch backend and the percolator feature [1] Or if you want to use Solr, checkout Luwak [2].
Regards, [1] http://www.elastic.co/guide/en/elasticsearch/reference/1.3/search-percolate.html [2] https://github.com/flaxsearch/luwak ----- Original Message ----- From: "Chris A Mattmann (3980)" <[email protected]> To: [email protected] Sent: Friday, April 10, 2015 9:31:17 PM Subject: [MASSMAIL]Re: Nutch | Gora with Kafka One thing you could consider doing is setting up a polling REST service either around Gora reading Nutch WebPage data, and/or directly interfacing with HBase. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Melih Sevsay <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Friday, April 10, 2015 at 1:07 AM To: "[email protected]" <[email protected]> Subject: Nutch | Gora with Kafka >Hi, >i would like to send fetched data to Kafka as a topic to create alert >system on that content, >For example; xyz.com web site content fetched and inserted into hbase, >at that particular moment i would like to make that content send to kafka >as topic and search by specific keyword. if the keyword found in content, >i will send message to user. >To send fetched data to kafka as topics, should i do this using nutch >plugins or gora. >or how could i do this. >Do you have any idea? >Thanks in advance...? >

