Re: solr with hadoop

Otis Gospodnetic Wed, 23 Jun 2010 21:49:37 -0700

I don't think it's ever been discussed - your Q below is #1 hit currently: 
http://search-lucene.com/?q=%2B%28dih+OR+dataimporthandler%29+hdfs
 Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/




----- Original Message ----
> From: Jon Baer <jonb...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, June 22, 2010 12:47:14 PM
> Subject: Re: solr with hadoop
> 
> I was playing around w/ Sqoop the other day, its a simple Cloudera tool for 
> imports (mysql -> hdfs) @ 
> href="http://www.cloudera.com/developers/downloads/sqoop/"; target=_blank 
> >http://www.cloudera.com/developers/downloads/sqoop/

It seems to me 
> (it would be pretty efficient) to dump to HDFS and have something like Data 
> Import Handler be able to read from hdfs:// directly ...

Has this route 
> been discussed / developed before (ie DIH w/ hdfs:// handler)?

- 
> Jon

On Jun 22, 2010, at 12:29 PM, MitchK wrote:

> 
> I 
> wanted to add a Jira-issue about exactly what Otis is asking here.
> 
> Unfortunately, I haven't time for it because of my exams.
> 
> 
> However, I'd like to add a question to Otis' ones:
> If you destribute the 
> indexing-progress this way, are you able to replicate
> the different 
> documents correctly?
> 
> Thank you.
> - Mitch
> 
> 
> Otis Gospodnetic-2 wrote:
>> 
>> Stu,
>> 
> 
>> Interesting!  Can you provide more details about your 
> setup?  By "load
>> balance the indexing stage" you mean 
> "distribute the indexing process",
>> right?  Do you simply take 
> your content to be indexed, split it into N
>> chunks where N matches 
> the number of TaskNodes in your Hadoop cluster and
>> provide a map 
> function that does the indexing?  What does the reduce
>> function 
> do?  Does that call IndexWriter.addAllIndexes or do you do that
>> 
> outside Hadoop?
>> 
>> Thanks,
>> Otis
>> 
> --
>> Sematext -- 
> >http://sematext.com/ -- Lucene - Solr - Nutch
>> 
>> 
> ----- Original Message ----
>> From: Stu Hood <
> ymailto="mailto:stuh...@webmail.us"; 
> href="mailto:stuh...@webmail.us";>stuh...@webmail.us>
>> To: 
> ymailto="mailto:solr-user@lucene.apache.org"; 
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>> 
> Sent: Monday, January 7, 2008 7:14:20 PM
>> Subject: Re: solr with 
> hadoop
>> 
>> As Mike suggested, we use Hadoop to organize our 
> data en route to Solr.
>> Hadoop allows us to load balance the indexing 
> stage, and then we use
>> the raw Lucene IndexWriter.addAllIndexes 
> method to merge the data to be
>> hosted on Solr instances.
>> 
> 
>> Thanks,
>> Stu
>> 
>> 
>> 
> 
>> -----Original Message-----
>> From: Mike Klaas <
> ymailto="mailto:mike.kl...@gmail.com"; 
> href="mailto:mike.kl...@gmail.com";>mike.kl...@gmail.com>
>> 
> Sent: Friday, January 4, 2008 3:04pm
>> To: 
> ymailto="mailto:solr-user@lucene.apache.org"; 
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>> 
> Subject: Re: solr with hadoop
>> 
>> On 4-Jan-08, at 11:37 AM, 
> Evgeniy Strokin wrote:
>> 
>>> I have huge index base 
> (about 110 millions documents, 100 fields  
>>> each). But size 
> of the index base is reasonable, it's about 70 Gb.  
>>> All I 
> need is increase performance, since some queries, which match  
> 
>>> big number of documents, are running slow.
>>> So I 
> was thinking is any benefits to use hadoop for this? And if  
> 
>>> so, what direction should I go? Is anybody did something 
> for  
>>> integration Solr with Hadoop? Does it give any 
> performance boost?
>>> 
>> Hadoop might be useful for 
> organizing your data enroute to Solr, but  
>> I don't see how it 
> could be used to boost performance over a huge  
>> Solr 
> index.  To accomplish that, you need to split it up over two  
> 
>> machines (for which you might find hadoop useful).
>> 
> 
>> -Mike
>> 
>> 
>> 
>> 
> 
>> 
>> 
>> 
> -- 
> View this message in 
> context: 
> href="http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914589.html";
>  
> target=_blank 
> >http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914589.html
> 
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr with hadoop

Reply via email to