[
https://issues.apache.org/jira/browse/SOLR-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784212#action_12784212
]
Andrzej Bialecki commented on SOLR-1614:
-----------------------------------------
If query performance is not a concern, then why not execute it directly on HDFS
(using e.g. Nutch FsDirectory to read indexes from HDFS)?
> Search in Hadoop
> ----------------
>
> Key: SOLR-1614
> URL: https://issues.apache.org/jira/browse/SOLR-1614
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.4
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 1.5
>
>
> What's the use case? Sometimes queries are expensive (such as
> regex) or one has indexes located in HDFS, that then need to be
> searched on. By leveraging Hadoop, these non-time sensitive
> queries may be executed without dynamically deploying the
> indexes to new Solr servers.
> We'll download the index out of HDFS (assuming they're zipped),
> perform the queries in a batch on the index shard, then merge
> the results either using a Solr query results priority queue, or
> simply using Hadoop's built in merge sorting.
> The query file will be encoded in JSON format, (ID, query,
> numresults,fields). The shards file will simply contain newline
> delimited paths (HDFS or otherwise). The output can be a Solr
> encoded results file per query.
> I'm hoping to add an actual Hadoop unit test.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.