RE: Indexing all permutations of words from the input

Steven A Rowe Thu, 20 Jan 2011 14:00:09 -0800

Hi Martin,

The co-occurrence filter I'm working on at
https://issues.apache.org/jira/browse/LUCENE-2749 would do what you want (among 
other things).  Still vaporware at this point, as I've only put a couple of 
hours into it, so don't hold your breath :)


Steve

> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: Thursday, January 20, 2011 4:46 PM
> To: Martin Jansen
> Cc: solr-user@lucene.apache.org
> Subject: Re: Indexing all permutations of words from the input
> 
> Aha, I have no idea if there actually is a better way of achieving that,
> auto-completion with Solr is always tricky and I personally have not
> been happy with any of the designs I've seen suggested for it.  But I'm
> also not entirely sure your design will actually work, but neither am I
> sure it won't!
> 
> I am thinking maybe for that auto-complete use, you will actually need
> your field to be NOT tokenized, so you won't want to use the WhiteSpace
> tokenizer after all (I think!) -- unless maybe there's another filter
> you can put at the end of the chain that will take all the tokens and
> join them back together,  seperated by a single space,  as a single
> token.  But I do think you'll need the whole multi-word string to be a
> single token in order to use terms.prefix how you want.
> 
> If you can't make ShingleFilter do it though, I don't think there is any
> built in analyzers that will do the transformation you want. You could
> write your own in Java, perhaps based on ShingleFilter -- or it might be
> easier to have your own software make the transformations you want and
> then simply send the pre-transformed strings to Solr when indexing. Then
> you could simply send them to a 'string' type field that won't tokenize.
> 
> On 1/20/2011 4:40 PM, Martin Jansen wrote:
> > On 20.01.11 22:19, Jonathan Rochkind wrote:
> >> On 1/20/2011 4:03 PM, Martin Jansen wrote:
> >>> I'm looking for an<analyzer>   configuration for Solr 1.4 that
> >>> accomplishes the following:
> >>>
> >>> Given the input "abc xyz foo" I would like to add at least the
> following
> >>> token combinations to the index:
> >>>
> >>>      abc
> >>>      abc xyz
> >>>      abc xyz foo
> >>>      abc foo
> >>>      xyz
> >>>      xyz foo
> >>>      foo
> >>>
> >> Why do you want to do this, what is it meant to accomplish?  There
> might be a better way to accomplish what it is you are trying to do; I
> can't think of anything (which doesn't mean it doesn't exist) that what
> you're actually trying to do would be required in order to do.  What sorts
> of queries do you intend to serve with this setup?
> > I'm in the process of setting up an index for term suggestion. In my use
> > case people should get the suggestion "abc foo" for the search query
> > "abc fo" and under the assumption that "abc xyz foo" has been submitted
> > to the index.
> >
> > My current plan is to use TermsComponent with the terms.prefix=
> > parameter for this, because it seems to be pretty efficient and I get
> > things like correct sorting for free.
> >
> > I assume there is a better way for achieving this then?
> >
> > - Martin

RE: Indexing all permutations of words from the input

Reply via email to