Aha, I have no idea if there actually is a better way of achieving that,
auto-completion with Solr is always tricky and I personally have not
been happy with any of the designs I've seen suggested for it. But I'm
also not entirely sure your design will actually work, but neither am I
sure it won't!
I am thinking maybe for that auto-complete use, you will actually need
your field to be NOT tokenized, so you won't want to use the WhiteSpace
tokenizer after all (I think!) -- unless maybe there's another filter
you can put at the end of the chain that will take all the tokens and
join them back together, seperated by a single space, as a single
token. But I do think you'll need the whole multi-word string to be a
single token in order to use terms.prefix how you want.
If you can't make ShingleFilter do it though, I don't think there is any
built in analyzers that will do the transformation you want. You could
write your own in Java, perhaps based on ShingleFilter -- or it might be
easier to have your own software make the transformations you want and
then simply send the pre-transformed strings to Solr when indexing. Then
you could simply send them to a 'string' type field that won't tokenize.
On 1/20/2011 4:40 PM, Martin Jansen wrote:
On 20.01.11 22:19, Jonathan Rochkind wrote:
On 1/20/2011 4:03 PM, Martin Jansen wrote:
I'm looking for an<analyzer> configuration for Solr 1.4 that
accomplishes the following:
Given the input "abc xyz foo" I would like to add at least the following
token combinations to the index:
abc
abc xyz
abc xyz foo
abc foo
xyz
xyz foo
foo
Why do you want to do this, what is it meant to accomplish? There might be a
better way to accomplish what it is you are trying to do; I can't think of
anything (which doesn't mean it doesn't exist) that what you're actually trying
to do would be required in order to do. What sorts of queries do you intend to
serve with this setup?
I'm in the process of setting up an index for term suggestion. In my use
case people should get the suggestion "abc foo" for the search query
"abc fo" and under the assumption that "abc xyz foo" has been submitted
to the index.
My current plan is to use TermsComponent with the terms.prefix=
parameter for this, because it seems to be pretty efficient and I get
things like correct sorting for free.
I assume there is a better way for achieving this then?
- Martin