Why do you want to do this, what is it meant to accomplish? There might
be a better way to accomplish what it is you are trying to do; I can't
think of anything (which doesn't mean it doesn't exist) that what you're
actually trying to do would be required in order to do. What sorts of
queries do you intend to serve with this setup?
I don't believe there is any analyzer that will do exactly what you've
specified, included with Solr out of the box. You could definitely write
your own analyzer in Java to do it. But I still suspect you may not
actually need to construct your index like that to accomplish whatever
you are trying to accomplish.
The only point I can think of to caring what words are next to what
other words is for phrase and proximity searches. However, with what
you've specified, phrase and proximity searches wouldn't be at all
useful anyway, as EVERY word would be next to every other word, so any
phrase or proximity search including any words present at all would
match, so might as well not do a phrase and proximity search at all, in
which case it should not matter what order or how close together the
words are in the index. Why not just use an ordinary Whitespace
Tokenizer, and just do ordinary dismax or lucene queries without using
phrase or proximity?
On 1/20/2011 4:03 PM, Martin Jansen wrote:
Hey there,
I'm looking for an<analyzer> configuration for Solr 1.4 that
accomplishes the following:
Given the input "abc xyz foo" I would like to add at least the following
token combinations to the index:
abc
abc xyz
abc xyz foo
abc foo
xyz
xyz foo
foo
A WhitespaceTokenizer combined with a ShingleFilter will take me there
to some extent, but won't e.g. add "abc foo" to the index. Is there a
way to do this?
- Martin