Re: Indexing all permutations of words from the input

Erick Erickson Sat, 22 Jan 2011 13:27:00 -0800

OK, Idea from left field off the top of my head, so don't take it for
gospel...


Create a second index where you send your data, each phrase is really a
"document"
and query *that* index for your autosuggest. Perhaps this could be a
secondary core.
It could even be a set of *special* documents in your existing index that
had orthogonal
fields to the normal ones.

The idea is that you'd have a "document" consisting of one stored and
indexed field
that contained "abc xyz foo". Searching for "+abc +foo" (no quotes) would
return it,
as would searching for "+abc +xyz" or +abc or +foo... You could even do
some
interesting things with dismax if you required some rule like "at least two
terms
must match if there are three" I think...

You'd have to do something about duplicates here...

Best
Erick

On Thu, Jan 20, 2011 at 4:58 PM, Steven A Rowe <sar...@syr.edu> wrote:

> Hi Martin,
>
> The co-occurrence filter I'm working on at
> https://issues.apache.org/jira/browse/LUCENE-2749 would do what you want
> (among other things).  Still vaporware at this point, as I've only put a
> couple of hours into it, so don't hold your breath :)
>
> Steve
>
> > -----Original Message-----
> > From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> > Sent: Thursday, January 20, 2011 4:46 PM
> > To: Martin Jansen
> > Cc: solr-user@lucene.apache.org
> > Subject: Re: Indexing all permutations of words from the input
> >
> > Aha, I have no idea if there actually is a better way of achieving that,
> > auto-completion with Solr is always tricky and I personally have not
> > been happy with any of the designs I've seen suggested for it.  But I'm
> > also not entirely sure your design will actually work, but neither am I
> > sure it won't!
> >
> > I am thinking maybe for that auto-complete use, you will actually need
> > your field to be NOT tokenized, so you won't want to use the WhiteSpace
> > tokenizer after all (I think!) -- unless maybe there's another filter
> > you can put at the end of the chain that will take all the tokens and
> > join them back together,  seperated by a single space,  as a single
> > token.  But I do think you'll need the whole multi-word string to be a
> > single token in order to use terms.prefix how you want.
> >
> > If you can't make ShingleFilter do it though, I don't think there is any
> > built in analyzers that will do the transformation you want. You could
> > write your own in Java, perhaps based on ShingleFilter -- or it might be
> > easier to have your own software make the transformations you want and
> > then simply send the pre-transformed strings to Solr when indexing. Then
> > you could simply send them to a 'string' type field that won't tokenize.
> >
> > On 1/20/2011 4:40 PM, Martin Jansen wrote:
> > > On 20.01.11 22:19, Jonathan Rochkind wrote:
> > >> On 1/20/2011 4:03 PM, Martin Jansen wrote:
> > >>> I'm looking for an<analyzer>   configuration for Solr 1.4 that
> > >>> accomplishes the following:
> > >>>
> > >>> Given the input "abc xyz foo" I would like to add at least the
> > following
> > >>> token combinations to the index:
> > >>>
> > >>>      abc
> > >>>      abc xyz
> > >>>      abc xyz foo
> > >>>      abc foo
> > >>>      xyz
> > >>>      xyz foo
> > >>>      foo
> > >>>
> > >> Why do you want to do this, what is it meant to accomplish?  There
> > might be a better way to accomplish what it is you are trying to do; I
> > can't think of anything (which doesn't mean it doesn't exist) that what
> > you're actually trying to do would be required in order to do.  What
> sorts
> > of queries do you intend to serve with this setup?
> > > I'm in the process of setting up an index for term suggestion. In my
> use
> > > case people should get the suggestion "abc foo" for the search query
> > > "abc fo" and under the assumption that "abc xyz foo" has been submitted
> > > to the index.
> > >
> > > My current plan is to use TermsComponent with the terms.prefix=
> > > parameter for this, because it seems to be pretty efficient and I get
> > > things like correct sorting for free.
> > >
> > > I assume there is a better way for achieving this then?
> > >
> > > - Martin
>

Re: Indexing all permutations of words from the input

Reply via email to