Re: Antw: Re: Behaviour of punctuation marks in phrase queries

Erick Erickson Fri, 17 May 2019 08:47:00 -0700
I’ll leave that explanation to someone who understands query parsers ;)

> On May 17, 2019, at 7:57 AM, Doris Peter <doris.pe...@bsb-muenchen.de> wrote:
> 
> Thanks a lot! I tried the debug parameter, which shows interesting 
> differences:
> 
> debug": {
> 
>    "rawquerystring": "all_places_txt:\"Neuburg a. d. Donau\"",
>    "querystring": "all_places_txt:\"Neuburg a. d. Donau\"",
>    "parsedquery": "PhraseQuery(all_places_txt:\"neuburg a d donau\")",
>    "parsedquery_toString": "all_places_txt:\"neuburg a d donau\"",
>    "QParser": "LuceneQParser"
> }
> 
> debug": {
>        "rawquerystring": "all_places_txt:\"Neuburg a.d. Donau\"",
>        "querystring": "all_places_txt:\"Neuburg a.d. Donau\"",
>        "parsedquery": "SpanNearQuery(spanNear([all_places_txt:neuburg, 
> spanOr([all_places_txt:ad, spanNear([all_places_txt:a, all_places_txt:d], 0, 
> true)]), all_places_txt:donau], 0, true))",
>        "parsedquery_toString": "spanNear([all_places_txt:neuburg, 
> spanOr([all_places_txt:ad, spanNear([all_places_txt:a, all_places_txt:d], 0, 
> true)]), all_places_txt:donau], 0, true)",
>        "QParser": "LuceneQParser"
>    }
> 
> 
> Something seems to go wrong here, as the parsedquery contains the 
> SpanNearQuery instead of a PhraseQuery.
> 
> 
> 
> 
> 
> 
> 
> 
> 
>>>> Erick Erickson <erickerick...@gmail.com> 5/17/2019 4:27 PM >>> 
> Three things:
> 
> 1> WordDelimiterGraphFilterFactory requires FlattenGraphFilterFactory after 
> it in the index config
> 
> 2> It is usually unnecessary to have the exact same parameters at both query 
> and index time for WDGFF. If you’ve split parts up at index time then mashed 
> them all back together, you can usually only split them up at query time.
> 
> 3> try adding &debug=query to the query and see what the results show for the 
> parsed query. That usually gives you a clue what is really happening .vs. 
> what you think is happening.
> 
> Best,
> Erick
> 
>> On May 17, 2019, at 12:59 AM, Doris Peter <doris.pe...@bsb-muenchen.de> 
>> wrote:
>> 
>> Hello, 
>> 
>> We use Solr 7.6.0 to build our index, and I have got a Question about
>> Phrase Queries:
>> 
>> We use the following configuration in schema.xml: 
>> 
>>   <!-- Text Standard -->
>>   <fieldType name="text" class="solr.TextField"
>> positionIncrementGap="1000" sortMissingLast="true"
>> autoGeneratePhraseQueries="true">
>>     <analyzer type="index">
>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>       <charFilter class="solr.MappingCharFilterFactory"
>> mapping="mapping-FoldToASCII.txt"/>
>>       <filter class="solr.CJKBigramFilterFactory"/>
>>       <filter class="solr.WordDelimiterGraphFilterFactory"
>> protected="protectedword.txt"
>>            preserveOriginal="0" splitOnNumerics="1"
>> splitOnCaseChange="0"
>>            catenateWords="1" catenateNumbers="1" catenateAll="1"
>>            generateWordParts="1" generateNumberParts="1"
>> stemEnglishPossessive="1"
>>            types="wdfftypes.txt" />
>>       <filter class="solr.LengthFilterFactory" min="1"
>> max="2147483647"/>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>     </analyzer>
>>     <analyzer type="query">
>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>       <charFilter class="solr.MappingCharFilterFactory"
>> mapping="mapping-FoldToASCII.txt"/>
>>       <filter class="solr.CJKBigramFilterFactory"/>
>>       <filter class="solr.WordDelimiterGraphFilterFactory"
>> protected="protectedword.txt"
>>            preserveOriginal="0" splitOnNumerics="1"
>> splitOnCaseChange="0"
>>            catenateWords="1" catenateNumbers="1" catenateAll="1"
>>            generateWordParts="1" generateNumberParts="1"
>> stemEnglishPossessive="1"
>>            types="wdfftypes.txt" />
>>       <filter class="solr.LengthFilterFactory" min="1"
>> max="2147483647"/>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>     </analyzer>
>>   </fieldType>
>> 
>> 
>>   If we search for a phrase like "Moosburg a.d. Isar" we don't get a
>> match, though it's definitely in our Index.
>>   If we search for "Moosburg a. d. Isar" with a blank between "a."
>> and "d." we get a match.
>> 
>>   This also happens for other non-word characters, like ' or , for
>> example.
>> 
>>   The strange thing about it is, that the Solr Analysis-Tool reports
>> a match for the first version, but when we send a Solr Query, we get no
>> result Documents.
>> 
>>   Has anyone got an idea, what this could be?
>> 
>>   Thank you very much in advance,
>> 
>>   Doris Peter
> 
>
Re: Antw: Re: Behaviour of punctuation marks in phrase queries

Reply via email to