Hi, Thanks for the suggestions! It looks like the MemoryIndex is worth having a detailed look at, so that's what I'll start on.
Thanks again, bye, Jaco. 2009/4/17 Steven A Rowe <sar...@syr.edu> > Hi Jaco, > > On 4/9/2009 at 2:58 PM, Jaco wrote: > > I'm struggling with some ideas, maybe somebody can help me with past > > experiences or tips. I have loaded a dictionary into a Solr index, > > using stemming and some stopwords in analysis part of the schema. > > Each record holds a term from the dictionary, which can consist of > > multiple words. For some data analysis work, I want to send pieces > > of text (sentences actually) to Solr to retrieve all possible > > dictionary terms that could occur. Ideally, I want to construct a > > query that only returns those Solr records for which all individual > > words in that record are matched. > > > > For instance, my dictionary holds the following terms: > > 1 - a b c d > > 2 - c d e > > 3 - a b > > 4 - a e f g h > > > > If I put the sentence [a b c d f g h] in as a query, I want to recieve > > dictionary items 1 (matching all words a b c d) and 3 (matching words a > > b) as matches > > > > I have been puzzling about how to do this. The only way I found so far > > was to construct an OR query with all words of the sentence in it. In > > this case, that would result in all dictionary items being returned. > > This would then require some code to go over the search results and > > analyse each of them (i.e. by using the highlight function) to kick > > out 'false' matches, but I am looking for a more efficient way. > > > > Is there a way to do this with Solr functionality, or do I need to > > start looking into the Lucene API ..? > > Your problem could be modeled as a set of standing queries, where your > dictionary entries are the *queries* (with all words required, maybe using a > PhraseQuery or a SpanNearQuery), and the sentence is the document. > > Solr may not be usable in this context (extremely high volume queries), > depending on your throughput requirements, but Lucene's MemoryIndex was > designed for this kind of thing: > > < > http://lucene.apache.org/java/2_4_1/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html > > > > Steve > >