Yes, I've read directly from NFS. Consider the case where your mapper takes as input a list of the file paths to operate on. Your mapper would load each file one by one by using standard java.io.* calls, build a SolrInputDocument out of each one, and submit it to a SolrServer implementation stored as a member field in the mapper during the setup call. Something like this:
https://gist.github.com/mdellabitta/5910253 I literally wrote that in the git editor just now, so I don't even know if it compiles, but you can get the idea. Note that the NFS mount has to be live on all of the task nodes. Also, if the number of lines in the input file is small enough, Hadoop might not split it enough for you, so you should use NLineInputFormat. And you should definitely tune the number of running tasks to make sure that you don't destroy your Solr box with lots of traffic. I've used the patch that Anatoli mentions as well, and that does work. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions w: appinions.com <http://www.appinions.com/> On Tue, Jul 2, 2013 at 3:17 AM, engy.morsy <engy.mo...@bibalex.org> wrote: > Michael, > > I understand from your post that I can use the current storage without in > Hadoop. I already have the storage mounted via NFS. > Does your map function read from the mounted storage directly? If possible > can you please illustrate more on that. > > Thanks > Engy > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-indexer-and-Hadoop-tp4072951p4074604.html > Sent from the Solr - User mailing list archive at Nabble.com. >