Search in Hadoop ---------------- Key: SOLR-1614 URL: https://issues.apache.org/jira/browse/SOLR-1614 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5
What's the use case? Sometimes queries are expensive (such as regex) or one has indexes located in HDFS, that then need to be searched on. By leveraging Hadoop, these non-time sensitive queries may be executed without dynamically deploying the indexes to new Solr servers. We'll download the index out of HDFS (assuming they're zipped), perform the queries in a batch on the index shard, then merge the results either using a Solr query results priority queue, or simply using Hadoop's built in merge sorting. The query file will be encoded in JSON format, (ID, query, numresults,fields). The shards file will simply contain newline delimited paths (HDFS or otherwise). The output can be a Solr encoded results file per query. I'm hoping to add an actual Hadoop unit test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.