Re: Nutch Solr integration

Lewis John Mcgibbney Mon, 06 Jan 2014 04:45:11 -0800

Hi Manikandan,

On Sun, Jan 5, 2014 at 11:43 PM, <[email protected]> wrote:


>
> I’m running Nutch 2.2.1 on top of a Hadoop 1.7 cluster and I’m using Gora
> to store the crawled data on a remote Cassandra 2.0.4 cluster.
>

Nice setUp. Glad to hear things are working out OK(ish) for you.


>
> I wish to setup a Solrcloud cluster and index the crawled data on it. Can
> I do that by integrating Nutch and Solr?


Yes it can be done that is for sure however the implementations are just
not in place right now. Please see Markus' issue and patch for trunk (not
applicable to 2.2.1) here [0].

Now this may seem like a long shot... however I think that this can be done
with Gora instead now that we've got the gora-solr module committed in
trunk[1]. As always though, there is however one caveat... the patch which
I created to use the different SolrServer implementations (including
SolrCloudServer) is not committed to Gora trunk as of yet as I didn't get
time to test it thoroughly enough. The patch i refer to can however be
found here [2]. This is an option for you most certainly.


> The tutorial in the website tells how to integrate when the crawl data is
> stored on the filesystem and when Solr is running locally. My situation is
> this:
>
> 1. Crawled data is on a remote Cassandra cluster.
> 2. The Solr cloud will again be remote.
>

This sounds like a nice use case for using the Gora code... however you
would need to write some glue code to tie it together.

hth
Lewis

>
>
[0] https://issues.apache.org/jira/browse/NUTCH-1377
[1] https://svn.apache.org/repos/asf/gora/trunk/gora-solr/
[2] https://issues.apache.org/jira/browse/GORA-260

Re: Nutch Solr integration

Reply via email to