OK, Idea from left field off the top of my head, so don't take it for gospel...
Create a second index where you send your data, each phrase is really a "document" and query *that* index for your autosuggest. Perhaps this could be a secondary core. It could even be a set of *special* documents in your existing index that had orthogonal fields to the normal ones. The idea is that you'd have a "document" consisting of one stored and indexed field that contained "abc xyz foo". Searching for "+abc +foo" (no quotes) would return it, as would searching for "+abc +xyz" or +abc or +foo... You could even do some interesting things with dismax if you required some rule like "at least two terms must match if there are three" I think... You'd have to do something about duplicates here... Best Erick On Thu, Jan 20, 2011 at 4:58 PM, Steven A Rowe <sar...@syr.edu> wrote: > Hi Martin, > > The co-occurrence filter I'm working on at > https://issues.apache.org/jira/browse/LUCENE-2749 would do what you want > (among other things). Still vaporware at this point, as I've only put a > couple of hours into it, so don't hold your breath :) > > Steve > > > -----Original Message----- > > From: Jonathan Rochkind [mailto:rochk...@jhu.edu] > > Sent: Thursday, January 20, 2011 4:46 PM > > To: Martin Jansen > > Cc: solr-user@lucene.apache.org > > Subject: Re: Indexing all permutations of words from the input > > > > Aha, I have no idea if there actually is a better way of achieving that, > > auto-completion with Solr is always tricky and I personally have not > > been happy with any of the designs I've seen suggested for it. But I'm > > also not entirely sure your design will actually work, but neither am I > > sure it won't! > > > > I am thinking maybe for that auto-complete use, you will actually need > > your field to be NOT tokenized, so you won't want to use the WhiteSpace > > tokenizer after all (I think!) -- unless maybe there's another filter > > you can put at the end of the chain that will take all the tokens and > > join them back together, seperated by a single space, as a single > > token. But I do think you'll need the whole multi-word string to be a > > single token in order to use terms.prefix how you want. > > > > If you can't make ShingleFilter do it though, I don't think there is any > > built in analyzers that will do the transformation you want. You could > > write your own in Java, perhaps based on ShingleFilter -- or it might be > > easier to have your own software make the transformations you want and > > then simply send the pre-transformed strings to Solr when indexing. Then > > you could simply send them to a 'string' type field that won't tokenize. > > > > On 1/20/2011 4:40 PM, Martin Jansen wrote: > > > On 20.01.11 22:19, Jonathan Rochkind wrote: > > >> On 1/20/2011 4:03 PM, Martin Jansen wrote: > > >>> I'm looking for an<analyzer> configuration for Solr 1.4 that > > >>> accomplishes the following: > > >>> > > >>> Given the input "abc xyz foo" I would like to add at least the > > following > > >>> token combinations to the index: > > >>> > > >>> abc > > >>> abc xyz > > >>> abc xyz foo > > >>> abc foo > > >>> xyz > > >>> xyz foo > > >>> foo > > >>> > > >> Why do you want to do this, what is it meant to accomplish? There > > might be a better way to accomplish what it is you are trying to do; I > > can't think of anything (which doesn't mean it doesn't exist) that what > > you're actually trying to do would be required in order to do. What > sorts > > of queries do you intend to serve with this setup? > > > I'm in the process of setting up an index for term suggestion. In my > use > > > case people should get the suggestion "abc foo" for the search query > > > "abc fo" and under the assumption that "abc xyz foo" has been submitted > > > to the index. > > > > > > My current plan is to use TermsComponent with the terms.prefix= > > > parameter for this, because it seems to be pretty efficient and I get > > > things like correct sorting for free. > > > > > > I assume there is a better way for achieving this then? > > > > > > - Martin >