I wanted to add a Jira-issue about exactly what Otis is asking here. Unfortunately, I haven't time for it because of my exams.
However, I'd like to add a question to Otis' ones: If you destribute the indexing-progress this way, are you able to replicate the different documents correctly? Thank you. - Mitch Otis Gospodnetic-2 wrote: > > Stu, > > Interesting! Can you provide more details about your setup? By "load > balance the indexing stage" you mean "distribute the indexing process", > right? Do you simply take your content to be indexed, split it into N > chunks where N matches the number of TaskNodes in your Hadoop cluster and > provide a map function that does the indexing? What does the reduce > function do? Does that call IndexWriter.addAllIndexes or do you do that > outside Hadoop? > > Thanks, > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > From: Stu Hood <stuh...@webmail.us> > To: solr-user@lucene.apache.org > Sent: Monday, January 7, 2008 7:14:20 PM > Subject: Re: solr with hadoop > > As Mike suggested, we use Hadoop to organize our data en route to Solr. > Hadoop allows us to load balance the indexing stage, and then we use > the raw Lucene IndexWriter.addAllIndexes method to merge the data to be > hosted on Solr instances. > > Thanks, > Stu > > > > -----Original Message----- > From: Mike Klaas <mike.kl...@gmail.com> > Sent: Friday, January 4, 2008 3:04pm > To: solr-user@lucene.apache.org > Subject: Re: solr with hadoop > > On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote: > >> I have huge index base (about 110 millions documents, 100 fields >> each). But size of the index base is reasonable, it's about 70 Gb. >> All I need is increase performance, since some queries, which match >> big number of documents, are running slow. >> So I was thinking is any benefits to use hadoop for this? And if >> so, what direction should I go? Is anybody did something for >> integration Solr with Hadoop? Does it give any performance boost? >> > Hadoop might be useful for organizing your data enroute to Solr, but > I don't see how it could be used to boost performance over a huge > Solr index. To accomplish that, you need to split it up over two > machines (for which you might find hadoop useful). > > -Mike > > > > > > > -- View this message in context: http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914589.html Sent from the Solr - User mailing list archive at Nabble.com.