Re: Indexing all permutations of words from the input

Jonathan Rochkind Thu, 20 Jan 2011 13:20:10 -0800

Why do you want to do this, what is it meant to accomplish? There mightbe a better way to accomplish what it is you are trying to do; I can'tthink of anything (which doesn't mean it doesn't exist) that what you'reactually trying to do would be required in order to do. What sorts ofqueries do you intend to serve with this setup?

I don't believe there is any analyzer that will do exactly what you'vespecified, included with Solr out of the box. You could definitely writeyour own analyzer in Java to do it. But I still suspect you may notactually need to construct your index like that to accomplish whateveryou are trying to accomplish.

The only point I can think of to caring what words are next to whatother words is for phrase and proximity searches. However, with whatyou've specified, phrase and proximity searches wouldn't be at alluseful anyway, as EVERY word would be next to every other word, so anyphrase or proximity search including any words present at all wouldmatch, so might as well not do a phrase and proximity search at all, inwhich case it should not matter what order or how close together thewords are in the index. Why not just use an ordinary WhitespaceTokenizer, and just do ordinary dismax or lucene queries without usingphrase or proximity?


On 1/20/2011 4:03 PM, Martin Jansen wrote:

Hey there,

I'm looking for an<analyzer>  configuration for Solr 1.4 that
accomplishes the following:

Given the input "abc xyz foo" I would like to add at least the following
token combinations to the index:

        abc
        abc xyz
        abc xyz foo
        abc foo
        xyz
        xyz foo
        foo

A WhitespaceTokenizer combined with a ShingleFilter will take me there
to some extent, but won't e.g. add "abc foo" to the index.  Is there a
way to do this?

- Martin

Re: Indexing all permutations of words from the input

Reply via email to