I wanted to add a Jira-issue about exactly what Otis is asking here.
Unfortunately, I haven't time for it because of my exams.

However, I'd like to add a question to Otis' ones:
If you destribute the indexing-progress this way, are you able to replicate
the different documents correctly?

Thank you.
- Mitch

Otis Gospodnetic-2 wrote:
> 
> Stu,
> 
> Interesting!  Can you provide more details about your setup?  By "load
> balance the indexing stage" you mean "distribute the indexing process",
> right?  Do you simply take your content to be indexed, split it into N
> chunks where N matches the number of TaskNodes in your Hadoop cluster and
> provide a map function that does the indexing?  What does the reduce
> function do?  Does that call IndexWriter.addAllIndexes or do you do that
> outside Hadoop?
> 
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
> From: Stu Hood <stuh...@webmail.us>
> To: solr-user@lucene.apache.org
> Sent: Monday, January 7, 2008 7:14:20 PM
> Subject: Re: solr with hadoop
> 
> As Mike suggested, we use Hadoop to organize our data en route to Solr.
>  Hadoop allows us to load balance the indexing stage, and then we use
>  the raw Lucene IndexWriter.addAllIndexes method to merge the data to be
>  hosted on Solr instances.
> 
> Thanks,
> Stu
> 
> 
> 
> -----Original Message-----
> From: Mike Klaas <mike.kl...@gmail.com>
> Sent: Friday, January 4, 2008 3:04pm
> To: solr-user@lucene.apache.org
> Subject: Re: solr with hadoop
> 
> On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote:
> 
>> I have huge index base (about 110 millions documents, 100 fields  
>> each). But size of the index base is reasonable, it's about 70 Gb.  
>> All I need is increase performance, since some queries, which match  
>> big number of documents, are running slow.
>> So I was thinking is any benefits to use hadoop for this? And if  
>> so, what direction should I go? Is anybody did something for  
>> integration Solr with Hadoop? Does it give any performance boost?
>>
> Hadoop might be useful for organizing your data enroute to Solr, but  
> I don't see how it could be used to boost performance over a huge  
> Solr index.  To accomplish that, you need to split it up over two  
> machines (for which you might find hadoop useful).
> 
> -Mike
> 
> 
> 
> 
> 
> 
> 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914589.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to