Of course you can do it, but the question is whether this will produce the performance results you expect. I've seen talk about this in other forums, so you might find some prior work here.
Solr and HDFS serve somewhat different purposes. The key issue would be if your map and reduce code overloads the Solr endpoint. Even using SolrCloud, I believe all requests will have to go through a single URL (to be routed), so if you have thousands of map/reduce jobs all running simultaneously, the question is whether your Solr is architected to handle that amount of throughput. On Thu, 2012-07-26 at 14:55 -0700, Trung Pham wrote: > Is it possible to run map reduce jobs directly on Solr4? > > I'm asking this because I want to use Solr4 as the primary storage engine. > And I want to be able to run near real time analytics against it as well. > Rather than export solr4 data out to a hadoop cluster.