Re: edismax phrase matching with a non-word char inbetween

Erick Erickson Wed, 14 Dec 2011 06:01:29 -0800

What I think is happening here is that WordDelimiterFilterFactory is
throwing away your non-alpha-numeric characters. You can see
this in admin/analysis, which I've found *extremely* helpful when
faced with this kind of question.


Best
Erick

On Tue, Dec 13, 2011 at 10:37 AM, Robert Brown <r...@intelcompute.com> wrote:
> I have a field which is indexed and queried as follows:
>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
> <filter class="solr.SynonymFilterFactory" synonyms="text-synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>
>
>
> When searching for "street work" (with quotes), i'm getting matches and
> highlighting on things like...
>
>
> "...Oxford <em>Street</em> (<em>Work</em> Experience)..."
>
>
> why is this happening, and what can I do to stop it?
>
> I've set <int name="qs">0</int> in my config to try and avert this sort of
> behaviour, am I correct in thinking that this is used to ensure there are no
> words in-between the phrase words?
>

Re: edismax phrase matching with a non-word char inbetween

Reply via email to