Hi Martin, The co-occurrence filter I'm working on at https://issues.apache.org/jira/browse/LUCENE-2749 would do what you want (among other things). Still vaporware at this point, as I've only put a couple of hours into it, so don't hold your breath :)
Steve > -----Original Message----- > From: Jonathan Rochkind [mailto:rochk...@jhu.edu] > Sent: Thursday, January 20, 2011 4:46 PM > To: Martin Jansen > Cc: solr-user@lucene.apache.org > Subject: Re: Indexing all permutations of words from the input > > Aha, I have no idea if there actually is a better way of achieving that, > auto-completion with Solr is always tricky and I personally have not > been happy with any of the designs I've seen suggested for it. But I'm > also not entirely sure your design will actually work, but neither am I > sure it won't! > > I am thinking maybe for that auto-complete use, you will actually need > your field to be NOT tokenized, so you won't want to use the WhiteSpace > tokenizer after all (I think!) -- unless maybe there's another filter > you can put at the end of the chain that will take all the tokens and > join them back together, seperated by a single space, as a single > token. But I do think you'll need the whole multi-word string to be a > single token in order to use terms.prefix how you want. > > If you can't make ShingleFilter do it though, I don't think there is any > built in analyzers that will do the transformation you want. You could > write your own in Java, perhaps based on ShingleFilter -- or it might be > easier to have your own software make the transformations you want and > then simply send the pre-transformed strings to Solr when indexing. Then > you could simply send them to a 'string' type field that won't tokenize. > > On 1/20/2011 4:40 PM, Martin Jansen wrote: > > On 20.01.11 22:19, Jonathan Rochkind wrote: > >> On 1/20/2011 4:03 PM, Martin Jansen wrote: > >>> I'm looking for an<analyzer> configuration for Solr 1.4 that > >>> accomplishes the following: > >>> > >>> Given the input "abc xyz foo" I would like to add at least the > following > >>> token combinations to the index: > >>> > >>> abc > >>> abc xyz > >>> abc xyz foo > >>> abc foo > >>> xyz > >>> xyz foo > >>> foo > >>> > >> Why do you want to do this, what is it meant to accomplish? There > might be a better way to accomplish what it is you are trying to do; I > can't think of anything (which doesn't mean it doesn't exist) that what > you're actually trying to do would be required in order to do. What sorts > of queries do you intend to serve with this setup? > > I'm in the process of setting up an index for term suggestion. In my use > > case people should get the suggestion "abc foo" for the search query > > "abc fo" and under the assumption that "abc xyz foo" has been submitted > > to the index. > > > > My current plan is to use TermsComponent with the terms.prefix= > > parameter for this, because it seems to be pretty efficient and I get > > things like correct sorting for free. > > > > I assume there is a better way for achieving this then? > > > > - Martin