Re: Dictionary lookup possibilities

Jaco Sat, 18 Apr 2009 07:27:19 -0700

Hi,

Thanks for the suggestions! It looks like the MemoryIndex is worth having a
detailed look at, so that's what I'll start on.


Thanks again, bye,

Jaco.


2009/4/17 Steven A Rowe <sar...@syr.edu>

> Hi Jaco,
>
> On 4/9/2009 at 2:58 PM, Jaco wrote:
> > I'm struggling with some ideas, maybe somebody can help me with past
> > experiences or tips. I have loaded a dictionary into a Solr index,
> > using stemming and some stopwords in analysis part of the schema.
> > Each record holds a term from the dictionary, which can consist of
> > multiple words. For some data analysis work, I want to send pieces
> > of text (sentences actually) to Solr to retrieve all possible
> > dictionary terms that could occur. Ideally, I want to construct a
> > query that only returns those Solr records for which all individual
> > words in that record are matched.
> >
> > For instance, my dictionary holds the following terms:
> > 1 - a b c d
> > 2 - c d e
> > 3 - a b
> > 4 - a e f g h
> >
> > If I put the sentence [a b c d f g h] in as a query, I want to recieve
> > dictionary items 1 (matching all words a b c d) and 3 (matching words a
> > b) as matches
> >
> > I have been puzzling about how to do this. The only way I found so far
> > was to construct an OR query with all words of the sentence in it. In
> > this case, that would result in all dictionary items being returned.
> > This would then require some code to go over the search results and
> > analyse each of them (i.e. by using the highlight function) to kick
> > out 'false' matches, but I am looking for a more efficient way.
> >
> > Is there a way to do this with Solr functionality, or do I need to
> > start looking into the Lucene API ..?
>
> Your problem could be modeled as a set of standing queries, where your
> dictionary entries are the *queries* (with all words required, maybe using a
> PhraseQuery or a SpanNearQuery), and the sentence is the document.
>
> Solr may not be usable in this context (extremely high volume queries),
> depending on your throughput requirements, but Lucene's MemoryIndex was
> designed for this kind of thing:
>
> <
> http://lucene.apache.org/java/2_4_1/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html
> >
>
> Steve
>
>

Re: Dictionary lookup possibilities

Reply via email to