Build Solr index using Hadoop MapReduce http://issues.apache.org/jira/browse/SOLR-1045
Ning Li-3 wrote: > > SOLR-1045 it is. More details will be available in that issue. > > Marc, you can check out Hadoop contrib/index which builds a Lucene > index using Hadoop MapReduce. However, it does not handle duplicate > detection. > > Cheers, > Ning > > > On Mon, Mar 2, 2009 at 4:25 PM, Marc Sturlese <marc.sturl...@gmail.com> > wrote: >> >> I am doing some research about creating lucene/solr index using hadoop >> but >> there's not so much info around, would be great to see some code!!! (I am >> experiencing problems specially in duplication detection) >> Thanks >> >> Shalin Shekhar Mangar wrote: >>> >>> On Mon, Mar 2, 2009 at 11:24 PM, Ning Li <ning.li...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I wonder if there is interest in a contrib module that builds Solr >>>> index using Hadoop MapReduce? >>>> >>> >>> Absolutely! >>> >>> >>>> It is different from the Solr support in Nutch. The Solr support in >>>> Nutch sends a document to a Solr server in a reduce task. Here, I aim >>>> at building/updating Solr index within map/reduce tasks. Also, it >>>> achieves better parallelism when the number of map tasks is greater >>>> than the number of reduce tasks, which is usually the case. >>>> >>>> I worked out a very simple initial version. But I want to check if >>>> there is any interest before proceeding. If so, I'll open a Jira >>>> issue. >>>> >>> >>> +1 >>> >>> Please do. It'd be great to see this in Solr. >>> >>> -- >>> Regards, >>> Shalin Shekhar Mangar. >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p22296832.html >> Sent from the Solr - Dev mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p26684154.html Sent from the Solr - Dev mailing list archive at Nabble.com.