Re: Solr indexer and Hadoop

Michael Della Bitta Tue, 02 Jul 2013 08:29:41 -0700

Yes, I've read directly from NFS.

Consider the case where your mapper takes as input a list of the file paths
to operate on. Your mapper would load each file one by one by using
standard java.io.* calls, build a SolrInputDocument out of each one, and
submit it to a SolrServer implementation stored as a member field in the
mapper during the setup call. Something like this:

https://gist.github.com/mdellabitta/5910253

I literally wrote that in the git editor just now, so I don't even know if
it compiles, but you can get the idea. Note that the NFS mount has to be
live on all of the task nodes. Also, if the number of lines in the input
file is small enough, Hadoop might not split it enough for you, so you
should use NLineInputFormat. And you should definitely tune the number of
running tasks to make sure that you don't destroy your Solr box with lots
of traffic.

I've used the patch that Anatoli mentions as well, and that does work.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions
w: appinions.com <http://www.appinions.com/>

On Tue, Jul 2, 2013 at 3:17 AM, engy.morsy <engy.mo...@bibalex.org> wrote:

> Michael,
>
> I understand from your post that I can use the current storage without in
> Hadoop. I already have the storage mounted via NFS.
> Does your map function read from the mounted storage directly? If possible
> can you please illustrate more on that.
>
> Thanks
> Engy
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-indexer-and-Hadoop-tp4072951p4074604.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr indexer and Hadoop

Reply via email to