Hi Chris,

You should be able to do this quick-and-dirty with a relatively simple 
modification to Nutch’s integrated Elasticsearch indexer plugin  (called 
indexer-elastic). Within the 
org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.write() method, try 
changing the index name (specifically the line IndexRequestBuilder request = 
client.prepareIndex(defaultIndex, type, id);) from defaultIndex to the domain 
name of the document that you’re indexing. 

And to answer Markus’s question, I think that the ElasticIndexWriter opens a 
single ES client connection, so you shouldn’t have to worry about a separate 
connection for each host. But maybe somebody with more know-how can give you a 
more affirmative answer.

Cheers

Jake

On Jun 18, 2014, at 2:54 PM, Chris Mielke <[email protected]> wrote:

> Hey all,
> 
> Pretty new to Nutch and getting it integrated with Elasticsearch. I've
> managed to finally get it working. Ideally, I'd like to have a separate
> Elasticsearch index for each site that is crawled, or a separate
> Elasticsearch index type for each site.
> 
> For example:
> Site abc.com ends up in the index "abc" in Elasticsearch
> Site xyz.com ends up in the index "xyz" in Elasticsearch
> 
> Is there a way to do this?
> 
> Thanks!
> 
> ..Chris
> 
> Chris Mielke
> Web Developer

Reply via email to