Re: Behaviour of punctuation marks in phrase queries

Erick Erickson Fri, 17 May 2019 07:27:55 -0700

Three things:

1> WordDelimiterGraphFilterFactory requires FlattenGraphFilterFactory after it 
in the index config


2> It is usually unnecessary to have the exact same parameters at both query 
and index time for WDGFF. If you’ve split parts up at index time then mashed 
them all back together, you can usually only split them up at query time.

3> try adding &debug=query to the query and see what the results show for the 
parsed query. That usually gives you a clue what is really happening .vs. what 
you think is happening.

Best,
Erick

> On May 17, 2019, at 12:59 AM, Doris Peter <doris.pe...@bsb-muenchen.de> wrote:
> 
> Hello, 
> 
> We use Solr 7.6.0 to build our index, and I have got a Question about
> Phrase Queries:
> 
> We use the following configuration in schema.xml: 
> 
>    <!-- Text Standard -->
>    <fieldType name="text" class="solr.TextField"
> positionIncrementGap="1000" sortMissingLast="true"
> autoGeneratePhraseQueries="true">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-FoldToASCII.txt"/>
>        <filter class="solr.CJKBigramFilterFactory"/>
>        <filter class="solr.WordDelimiterGraphFilterFactory"
> protected="protectedword.txt"
>             preserveOriginal="0" splitOnNumerics="1"
> splitOnCaseChange="0"
>             catenateWords="1" catenateNumbers="1" catenateAll="1"
>             generateWordParts="1" generateNumberParts="1"
> stemEnglishPossessive="1"
>             types="wdfftypes.txt" />
>        <filter class="solr.LengthFilterFactory" min="1"
> max="2147483647"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-FoldToASCII.txt"/>
>        <filter class="solr.CJKBigramFilterFactory"/>
>        <filter class="solr.WordDelimiterGraphFilterFactory"
> protected="protectedword.txt"
>             preserveOriginal="0" splitOnNumerics="1"
> splitOnCaseChange="0"
>             catenateWords="1" catenateNumbers="1" catenateAll="1"
>             generateWordParts="1" generateNumberParts="1"
> stemEnglishPossessive="1"
>             types="wdfftypes.txt" />
>        <filter class="solr.LengthFilterFactory" min="1"
> max="2147483647"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
> 
> 
>    If we search for a phrase like "Moosburg a.d. Isar" we don't get a
> match, though it's definitely in our Index.
>    If we search for "Moosburg a. d. Isar" with a blank between "a."
> and "d." we get a match.
> 
>    This also happens for other non-word characters, like ' or , for
> example.
> 
>    The strange thing about it is, that the Solr Analysis-Tool reports
> a match for the first version, but when we send a Solr Query, we get no
> result Documents.
> 
>    Has anyone got an idea, what this could be?
> 
>    Thank you very much in advance,
> 
>    Doris Peter

Re: Behaviour of punctuation marks in phrase queries

Reply via email to