Re: autoGeneratePhraseQueries not working

Erick Erickson Tue, 16 Apr 2019 07:47:59 -0700

The issue isn’t SoW. What’s happening here is that the query _parser_ passes 
my25word through as a single token, then WordDelimiterGraphFilterFactory splits 
it up on number/letter changes after SoW is out of the picture. The 
admin/analysis page will show you how this works.


By fiddling with the settings in WordDelimiterGraphFilterFactory, you can get 
close to auto phrase queries, in particular catenateall. But it’s not quite the 
same thing under any circumstances as phrases.

Best,
Erick

> On Apr 16, 2019, at 4:31 AM, Leonardo Francalanci 
> <leoonar...@yahoo.it.INVALID> wrote:
> 
> Thank you for the reply.
> I'm using eDisMax, does it use the same parser as the Standard Query Parser 
> then?
> I think this behavior should be documented somehow... it's very confusing and 
> to be honest I don't even remember how I got to the sow parameter... and I'm 
> not sure what that means for all other queries I have
> 
>    Il martedì 16 aprile 2019, 13:09:26 CEST, Alexandre Rafalovitch 
> <arafa...@gmail.com> ha scritto:  
> 
> The issue is that the Standard Query Parser does pre-processing of the
> query and splits it on whitespace beforehand (to deal with all the
> special syntax). So, if you don't use quoted phrases then by the time
> the field specific query analyzer chain kicks in, the text is already
> pre-split and the analyzer only sees one (pre space-separated) token
> at a time. So, the autoGeneratePhraseQueries does not work then. If
> you use different parsers that send whole text in (e.g. FieldQParser),
> then - I think - it will work.
> 
> Or, like you discovered, sow=true tells the Standard Query Parser to
> send it all together as well.
> 
> It is a bit of a messy part of Solr, because the Admin Analysis page
> sends the text to the query analyzer without splitting (it does not
> use any Query Parser). So, that adds to the confusion.
> 
> Regards,
>   Alex.
> 
> On Tue, 16 Apr 2019 at 10:53, Leonardo Francalanci
> <leoonar...@yahoo.it.invalid> wrote:
>> 
>>   To add some information: using "sow=true" it seems to work.But I don't 
>> understand why with "sow=false" it wouldn't work (can't find anything in the 
>> docs about sow interaction with autoGeneratePhraseQueries); and the 
>> implication of setting saw=true.
>> I've found this:[SOLR-9185] Solr's edismax and "Lucene"/standard query 
>> parsers should optionally not split on whitespace before sending terms to 
>> analysis - ASF JIRA
>> 
>> |
>> |
>> |  |
>> [SOLR-9185] Solr's edismax and "Lucene"/standard query parsers should op...
>> 
>> 
>>   |
>> 
>>   |
>> 
>>   |
>> 
>> 
>> But it's very low level and I can't find any doc more "user friendly"
>> 
>>     Il martedì 16 aprile 2019, 09:00:08 CEST, Leonardo Francalanci 
>> <leoonar...@yahoo.it.INVALID> ha scritto:
>> 
>>   Hi,
>> 
>> I'm using Solr 8.0.0  I can't get autoGeneratePhraseQueries to work (also 
>> tried with 7.7.1 and same result):
>> 
>> debug":{
>>     "rawquerystring":"TROUBLESHOOT:my25word",
>>     "querystring":"TROUBLESHOOT:my25word",
>>     "parsedquery":"TROUBLESHOOT:my TROUBLESHOOT:25 TROUBLESHOOT:word",
>>     "parsedquery_toString":"TROUBLESHOOT:my TROUBLESHOOT:25 
>> TROUBLESHOOT:word",
>> 
>> I expected something like
>> 
>> "parsedquery":"TROUBLESHOOT:"my 25 word"
>> Why isn't autoGeneratePhraseQueries generating a quoted string argument when 
>> I query???
>> 
>> 
>> This is my configuration:
>> 
>>       <dynamicField name="*_txt_en_split" type="text_en_splitting"  
>> indexed="true"  stored="true"/>
>>     <fieldType name="text_en_splitting" class="solr.TextField" 
>> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>>       <analyzer type="index">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <!-- in this example, we will only use synonyms at query time
>>         <filter class="solr.SynonymGraphFilterFactory" 
>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>         -->
>>         <!-- Case insensitive stop word removal.
>>         -->
>>         <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="lang/stopwords_en.txt"
>>         />
>>         <filter class="solr.WordDelimiterGraphFilterFactory" 
>> generateWordParts="1" generateNumberParts="1" catenateWords="1" 
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.KeywordMarkerFilterFactory" 
>> protected="protwords.txt"/>
>>         <filter class="solr.PorterStemFilterFactory"/>
>>         <filter class="solr.FlattenGraphFilterFactory" />
>>       </analyzer>
>>       <analyzer type="query">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.SynonymGraphFilterFactory" 
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>         <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="lang/stopwords_en.txt"
>>         />
>>         <filter class="solr.WordDelimiterGraphFilterFactory" 
>> generateWordParts="1" generateNumberParts="1" catenateWords="0" 
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.KeywordMarkerFilterFactory" 
>> protected="protwords.txt"/>
>>         <filter class="solr.PorterStemFilterFactory"/>
>>       </analyzer>
>>     </fieldType>
>> <field name="TROUBLESHOOT" type="text_en_splitting"  indexed="true" 
>> stored="true" multiValued="true" omitNorms="true"/>
>>

Re: autoGeneratePhraseQueries not working

Reply via email to