RE: Index an entire Phrase and not it's constituent parts?

MitchK Sun, 14 Mar 2010 01:46:02 -0800

Hmm, I don't understand the problem.

Look: If your analyzer looks like:
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SynonymFilterFactory"/>
        <filter class="solr.StopFilterFactory"/>

And your document would looks like:
"There is a big performance issue. Solving the problem would be great. As
long as we try to give our best, ..."

After the LowerCaseFilterFactory every word would be lowercased. Now you are
passing it to the SynonymFilterFactory and it would be transformed in
something like (I ignore the changes of the other tokenizers):
"there is a big performance issue. solving the problem would be great.
specialphrase1 we try to give our best,..."

Afterwards the StopFilterFactory may change it this way:
"there big performance issue. solving problem great. specialphrase1 try give
our best,..."

My idea to do this work in another field comes from considering the case,
when a user is searching for "amount of blabla" instead of "in amount of
blabla". After passing this phrase to a StopFilter it would looks like:
"amount bla bla". So you got a chance to find the right document in the
index, when a user is not using the full pre-defined phrase.
You don't need to do so. You even don't need to score the normal-field and
the phrase-field differently. It was only a suggestion. :)

However, if I missunderstood your post, and you don't want to replace the
phrases with something like "specialphrase1", try to use the keepWordFilter
- it sounds like it may do what you want. Have a look at the
analysis.jsp-page to see what its results are.

BTW: If you really need to code your own tokenizer, have a look at a filter,
that summerizes several words as one term.

Something that I really hate are manual-phrases like "it behaves like the
inversion of xy-filter" - almost nobody can really imagine what this filter
exactly is for... so, if this filter is what you are searching for, please
contribute a better description for the javadocs :).

Kind regards
- Mitch
--
View this message in context:
http://old.nabble.com/Index-an-entire-Phrase-and-not-it%27s-constituent-parts--tp27785521p27893777.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Index an entire Phrase and not it's constituent parts?

Reply via email to