Re: Index an entire Phrase and not it's constituent parts?

Lance Norskog Sat, 13 Mar 2010 20:51:21 -0800

CommonGrams is a tool for this. It makes "is a" into a token, but then
"is" and "a" are still removed as stopwords.


http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory

On 3/13/10, Christopher Ball <christopher.b...@metaheuristica.com> wrote:
> Thank you for the idea Mitch, but it just doesn't seem right that I should
> have to revert to Scoring when what I really need seems so fundamental.
>
> Logically, what I want is a "phrase filter factory" that would match on
> phrases listed in a file, like stopwords, but in this case index the match
> and then discard the words of the phrase from the stream before passing it
> on to the next filter given the phrases are imbedded in paragraphs which
> have other valid index material.
>
> So an analyzer would look something like:
>
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.WordDelimiterFilterFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.PhraseFilterFactory "/>
>         <filter class="solr.StopFilterFactory"/>
>       </analyzer>
>
> Of course, one riddle that this leaves us how to match a tokenized stream. .
> . so maybe I need to also write my own tokenizer. Just seems like this would
> have been a previously desired and solved problem.
>
> Or may be I should try solr.KeepWordFilterFactory if it can deal with
> phrases . . ?
>
> I'm stumped =(
>
> -----Original Message-----
> From: MitchK [mailto:mitc...@web.de]
> Sent: Saturday, March 13, 2010 8:12 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Index an entire Phrase and not it's constituent parts?
>
>
> Christopher,
>
> maybe the SynonymFilter can help you to solve your problem.
>
> Let me try to explain:
> If you create an extra field in the index for your use-case, you can boost
> matches of them in a special way.
>
> The next step is creating an extra synonym-file.
> as much as => SpecialPhrase1
> in amount of => SpecialPhrase2
> ... and so on...
>
> If an user wants to query for something like "as much as I love you" you can
> do some boosting on matches from the SpecialPhrase-field and you are able to
> response results from both: the normal StopWordFiltered data and the
> SpecialPhrase-data.
>
> If this fits your needs, please let me know.
>
> Kind regards
> - Mitch
> --
> View this message in context:
> http://old.nabble.com/Index-an-entire-Phrase-and-not-it%27s-constituent-part
> s--tp27785521p27887564.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>


-- 
Lance Norskog
goks...@gmail.com

Re: Index an entire Phrase and not it's constituent parts?

Reply via email to