Build Solr index using Hadoop MapReduce
http://issues.apache.org/jira/browse/SOLR-1045


Ning Li-3 wrote:
> 
> SOLR-1045 it is. More details will be available in that issue.
> 
> Marc, you can check out Hadoop contrib/index which builds a Lucene
> index using Hadoop MapReduce. However, it does not handle duplicate
> detection.
> 
> Cheers,
> Ning
> 
> 
> On Mon, Mar 2, 2009 at 4:25 PM, Marc Sturlese <marc.sturl...@gmail.com>
> wrote:
>>
>> I am doing some research about creating lucene/solr index using hadoop
>> but
>> there's not so much info around, would be great to see some code!!! (I am
>> experiencing problems specially in duplication detection)
>> Thanks
>>
>> Shalin Shekhar Mangar wrote:
>>>
>>> On Mon, Mar 2, 2009 at 11:24 PM, Ning Li <ning.li...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I wonder if there is interest in a contrib module that builds Solr
>>>> index using Hadoop MapReduce?
>>>>
>>>
>>> Absolutely!
>>>
>>>
>>>> It is different from the Solr support in Nutch. The Solr support in
>>>> Nutch sends a document to a Solr server in a reduce task. Here, I aim
>>>> at building/updating Solr index within map/reduce tasks. Also, it
>>>> achieves better parallelism when the number of map tasks is greater
>>>> than the number of reduce tasks, which is usually the case.
>>>>
>>>> I worked out a very simple initial version. But I want to check if
>>>> there is any interest before proceeding. If so, I'll open a Jira
>>>> issue.
>>>>
>>>
>>> +1
>>>
>>> Please do. It'd be great to see this in Solr.
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p22296832.html
>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p26684154.html
Sent from the Solr - Dev mailing list archive at Nabble.com.

Reply via email to