RE: Edismax ignoring queries containing booleans

Claire Pollard Fri, 10 Jan 2020 02:31:21 -0800

Hi Edward,

Thank you so much for your reply. Your explanations have really helped me 
understand the impact of mm on our queries 😊


I'm going to try what you suggest but I agree, it seems like 2 or 3 is the best 
option for us. We still would like the behaviour of mm on certain queries, so 
removing it from the solrconfig isn't possible.

I'll let you know how I get on, might be a little while until I get some 
results, but thank you again!

Cheers,
Claire.

-----Original Message-----
From: Edward Ribeiro <edward.ribe...@gmail.com> 
Sent: 10 January 2020 05:16
To: solr-user@lucene.apache.org
Subject: Re: Edismax ignoring queries containing booleans

The fq is not affected by mm parameter because it uses Solr's default query 
parser (LuceneQueryParser) that doesn't support it. But you can change the 
parser used by fq this way: fq={!edismax}recordID:(10 20) or fq={!edismax
mm=1}recordID:(10 20) , for example (even though that is not the case here).

Please, let me know if any of the suggestions, or any other you come up with, 
solve the issue and don't forget to test those approaches so that you can avoid 
any performance degradation.

Best,
Edward

On Fri, Jan 10, 2020 at 1:41 AM Edward Ribeiro <edward.ribe...@gmail.com>
wrote:

> Hi Claire,
>
> > The only visual difference I think is the ~2 which came after the
> initial part of the parsed query:
> > Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> (recordID:[20 TO 20]))~2
> > New Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> (recordID:[20 TO 20]))
>
> The mm (minimum match) parameter alter the behaviour of the OR clauses.
> See here:
> https://lucene.apache.org/solr/guide/8_3/the-dismax-query-parser.html#
> mm-minimum-should-match-parameter For example, if there is a query 
> like `text:(toys OR children OR sales)`, but your mm=3, then at least 
> three terms are required to match. The query is now equivalent to 
> `text:(toys AND children AND sales)`
>
> In the "+((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 TO 
> 20]))~2" query the "))~2" part means that at least two matches are 
> required of the three optional terms: 18, 19, and 20. But recordID 
> will only match at most one term. Therefore, it will return no 
> documents because it will never satisfy the condition setup by mm 
> (match 18 AND 19 AND 20). If mm=1 the query would work as intended in this 
> example.
>
> The mm parameter you use is: 0<1 2<-1 5<-2 6<90% that can roughly be 
> translated as:
>
> * 0<1 : If there is one term then minimum match 1??? Didn't get this one.
>
> * 2<-1 5<-2 6<90% : If there are one or two terms then mininum match all.
> Between 3 and 5 (inclusive) terms match all but one (in your example 
> there are 3 numbers so it will require to match at least 2, that’s the 
> reason of the ~2). If there are 6 terms then match 4 (6 - 2), and 
> above 6 terms then matches 90% of the terms (e.g., if there are 10 
> clauses then it is required to match at least 9).
>
> > There shouldn't be a problem using mm with edismax right? Or does 
> > the
> problem lie with the structure of my qf/pf and then adding mm?
>
> Nope. There’s no problem using mm with edismax nor the problem lies on 
> qf/pf. As you dig
>
> > I can see this is a change to default behaviour, but does it mean I
> should be passing mm in the query now rather than just at config level?
>
> I see a couple of approaches to solve this issue:
>
> 1) Removing the mm parameter from solrconfig. But it probably was 
> setup for a reason so you should check before hand. In this case, you 
> could issue
> mm=0<1 2<-1 5<-2 6<90% as a query parameter if necessary.
>
> 2) Adding a mm=1 as a query parameter whenever you search for recordID.
> Issuing the parameter in the query will overwrite the mm parameter 
> that was setup in solrconfig for that particular query.
>
> 3) Doing a match all query (q=*:*) and moving the recordID query to a 
> filter query: fq=recordID:(18 OR 19 OR 20)  The fq is not affected by 
> mm parameter or so it seems. No need to change mm in solrconfig nor 
> adding mm as a query parameter.
>
> Particularly, I would go with either 2) or 3).
>
> Best,
> Edward
>
> On Thu, Jan 9, 2020 at 7:47 AM Claire Pollard 
> <claire.poll...@imagen.io>
> wrote:
> >
> > Also, I've found this bug from previous which highlights the issue 
> > with
> ))~2
> >
> > https://issues.apache.org/jira/browse/SOLR-8812
> >
> > mm is set at config, but not explicitly in the query...
> >
> > I can see this is a change to default behaviour, but does it mean I
> should be passing mm in the query now rather than just at config level?
> >
> > -----Original Message-----
> > From: Claire Pollard <claire.poll...@imagen.io>
> > Sent: 09 January 2020 10:23
> > To: solr-user@lucene.apache.org
> > Subject: RE: Edismax ignoring queries containing booleans
> >
> > Hey Edward,
> >
> > Thanks for the tips.
> >
> > I've cleaned up my solrconfig, removed the duplicate df, tabs and
> newlines, and tried commenting out the bits you've suggested and 
> adding them back in bit by bit, and it seems mm was the thing which is 
> breaking the query for me.
> >
> > Without it, the query returns 2 documents as expected.
> >
> > "debug":{
> >     "rawquerystring":"recordID:(18 OR 19 OR 20)",
> >     "querystring":"recordID:(18 OR 19 OR 20)",
> >     "parsedquery":"+((recordID:[18 TO 18]) (recordID:[19 TO 19])
> (recordID:[20 TO 20])) DisjunctionMaxQuery(((text:\"19 20\"~100)^0.2 |
> (annotations:\"19 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 
> |
> collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | 
> (Test_AR:\"19 20\"~100)^1.1))",
> >     "parsedquery_toString":"+(recordID:[18 TO 18] recordID:[19 TO 
> > 19]
> recordID:[20 TO 20]) ((text:\"19 20\"~100)^0.2 | (annotations:\"19
> 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
> collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | 
> (Test_AR:\"19 20\"~100)^1.1)",
> >     "explain":{
> >       "2CBF8A49-CA2D-4e42-88F2-3790922EF415":"\n1.0 = sum of:\n  1.0 
> > =
> sum of:\n    1.0 = recordID:[19 TO 19]\n",
> >       "F73CFBC7-2CD2-4aab-B8C1-9D19D427EAFB":"\n1.0 = sum of:\n  1.0 
> > =
> sum of:\n    1.0 = recordID:[20 TO 20]\n"},
> >
> > The only visual difference I think is the ~2 which came after the
> initial part of the parsed query:
> >
> > Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> (recordID:[20 TO 20]))~2 New Query start: +((recordID:[18 TO 18])
> (recordID:[19 TO 19]) (recordID:[20 TO 20]))
> >
> > There shouldn't be a problem using mm with edismax right? Or does 
> > the
> problem lie with the structure of my qf/pf and then adding mm?
> >
> > Cheers,
> > Claire.
> >
> > -----Original Message-----
> > From: Edward Ribeiro <edward.ribe...@gmail.com>
> > Sent: 09 January 2020 02:28
> > To: solr-user@lucene.apache.org
> > Subject: Re: Edismax ignoring queries containing booleans
> >
> > Hi Claire,
> >
> > Unfortunately I didn't see anything in the debug explain that could
> potentially be the source of the problem. As Saurabh, I tested on a 
> core and it worked for me.
> >
> > I suggest that you simplify the solrconfig (commenting out qf, mm,
> spellchecker config and pf, for example) and reload the core. If the 
> query works then you  reinsert the config one by one, reloading the 
> core and see if the query works.
> >
> > A few remarks based on a snippet of the solrconfig you posted on a
> previous
> > e-mail:
> >
> > * Your solrconfig.xml defines df two times (the debug shows
> "df":["text", "text"]);
> >
> > * There are a couple codes like &#x09; &#x0D; and &#x0A; It would be 
> > nice to remove It;
> >
> > Please, let us know if you find why. :)
> >
> > Best,
> > Edward
> >
> >
> > Em qua, 8 de jan de 2020 13:00, Claire Pollard 
> > <claire.poll...@imagen.io
> >
> > escreveu:
> >
> > > It would be lovely to be able to use range to complete my 
> > > searches, but sadly documents aren't necessarily sequential so I 
> > > might want say 18, 24 or
> > > 30 in future.
> > >
> > > I've re-run the query with debug on. Is there anything here that 
> > > looks unusual? Thanks.
> > >
> > > {
> > >   "responseHeader":{
> > >     "status":0,
> > >     "QTime":75,
> > >     "params":{
> > >       "mm":"\r\n       0<1 2<-1 5<-2 6<90%\r\n      ",
> > >       "spellcheck.collateExtendedResults":"true",
> > >       "df":["text",
> > >         "text"],
> > >       "q.alt":"*:*",
> > >       "ps":"100",
> > >       "spellcheck.dictionary":["default",
> > >         "wordbreak"],
> > >       "bf":"",
> > >       "echoParams":"all",
> > >       "fl":"*,score",
> > >       "spellcheck.maxCollations":"5",
> > >       "rows":"10",
> > >       "spellcheck.alternativeTermCount":"5",
> > >       "spellcheck.extendedResults":"true",
> > >       "q":"recordID:(18 OR 19 OR 20)",
> > >       "defType":"edismax",
> > >       "spellcheck.maxResultsForSuggest":"5",
> > >       "qf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.4 recordID^10.0
> > > annotations^0.5 collectionTitle^1.9 collectionDescription^0.9
> > > title^2.0
> > > Test_FR^1.0 Test_DE^1.0 Test_AR^1.0 genre^1.0 genre_fr^1.0 
> > > french2^1.0\r\n\n\t\t\t\t\n\t\t\t",
> > >       "spellcheck":"on",
> > >       "pf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.2 recordID^10.0
> > > annotations^0.6 collectionTitle^2.0 collectionDescription^1.0
> > > title^2.1
> > > Test_FR^1.1 Test_DE^1.1 Test_AR^1.1 genre^1.1 genre_fr^1.1 
> > > french2^1.1\r\n\n\t\t\t\t\n\t\t\t",
> > >       "spellcheck.count":"10",
> > >       "debugQuery":"on",
> > >       "_":"1578499092576",
> > >       "spellcheck.collate":"true"}},
> > >   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
> > >   },
> > >   "spellcheck":{
> > >     "suggestions":[],
> > >     "correctlySpelled":false,
> > >     "collations":[]},
> > >   "debug":{
> > >     "rawquerystring":"recordID:(18 OR 19 OR 20)",
> > >     "querystring":"recordID:(18 OR 19 OR 20)",
> > >     "parsedquery":"+((recordID:[18 TO 18]) (recordID:[19 TO 19])
> > > (recordID:[20 TO 20]))~2 DisjunctionMaxQuery(((text:\"19 
> > > 20\"~100)^0.2
> > > |
> > > (annotations:\"19 20\"~100)^0.6 | (collectionTitle:\"19 
> > > 20\"~100)^2.0
> > > |
> > > collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> > > (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 |
> > > (Test_AR:\"19 20\"~100)^1.1))",
> > >     "parsedquery_toString":"+((recordID:[18 TO 18] recordID:[19 TO 
> > > 19]
> > > recordID:[20 TO 20])~2) ((text:\"19 20\"~100)^0.2 | 
> > > (annotations:\"19
> > > 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
> > > collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> > > (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 |
> > > (Test_AR:\"19 20\"~100)^1.1)",
> > >     "explain":{},
> > >     "QParser":"ExtendedDismaxQParser",
> > >     "altquerystring":null,
> > >     "boost_queries":null,
> > >     "parsed_boost_queries":[],
> > >     "boostfuncs":[""],
> > >     "timing":{
> > >       "time":75.0,
> > >       "prepare":{
> > >         "time":35.0,
> > >         "query":{
> > >           "time":35.0},
> > >         "facet":{
> > >           "time":0.0},
> > >         "facet_module":{
> > >           "time":0.0},
> > >         "mlt":{
> > >           "time":0.0},
> > >         "highlight":{
> > >           "time":0.0},
> > >         "stats":{
> > >           "time":0.0},
> > >         "expand":{
> > >           "time":0.0},
> > >         "terms":{
> > >           "time":0.0},
> > >         "spellcheck":{
> > >           "time":0.0},
> > >         "debug":{
> > >           "time":0.0}},
> > >       "process":{
> > >         "time":38.0,
> > >         "query":{
> > >           "time":29.0},
> > >         "facet":{
> > >           "time":0.0},
> > >         "facet_module":{
> > >           "time":0.0},
> > >         "mlt":{
> > >           "time":0.0},
> > >         "highlight":{
> > >           "time":0.0},
> > >         "stats":{
> > >           "time":0.0},
> > >         "expand":{
> > >           "time":0.0},
> > >         "terms":{
> > >           "time":0.0},
> > >         "spellcheck":{
> > >           "time":6.0},
> > >         "debug":{
> > >           "time":1.0}}}}}
> > >
> > > -----Original Message-----
> > > From: Edward Ribeiro <edward.ribe...@gmail.com>
> > > Sent: 07 January 2020 01:05
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Edismax ignoring queries containing booleans
> > >
> > > Hi Claire,
> > >
> > > You can add the following parameter `&debug=all` on the URL to 
> > > bring back debugging info and share with us (if you are using the 
> > > Solr admin UI you should check the `debugQuery` checkbox).
> > >
> > > Also, if you are searching a sequence of values you could perform 
> > > a range
> > > query: recordID:[18 TO 20]
> > >
> > > Best,
> > > Edward
> > >
> > > On Mon, Jan 6, 2020 at 10:46 AM Claire Pollard 
> > > <claire.poll...@imagen.io>
> > > wrote:
> > > >
> > > > Ok... It doesn't work for me. I'm fairly new to Solr so any help 
> > > > would be
> > > appreciated!
> > > >
> > > > My managed-schema field and field type look like this:
> > > >
> > > > <field name="recordID" type="long" indexed="true" stored="true"
> > > required="true" multiValued="false" />
> > > > <fieldType name="long" class="solr.LongPointField"
> sortMissingLast="true"
> > > omitNorms="true" />
> > > >
> > > > And my solrconfig.xml select/query handlers look like this:
> > > >
> > > >         <requestHandler name="/select" class="solr.SearchHandler">
> > > >                 <lst name="defaults">
> > > >                         <str name="echoParams">all</str>
> > > >                         <!-- Query settings -->
> > > >                         <str name="defType">edismax</str>
> > > >                         <str name="qf">
> > > >                                 &#x09;text^0.4 recordID^10.0
> > > annotations^0.5 collectionTitle^1.9 collectionDescription^0.9
> > > title^2.0
> > > Test_FR^1.0 Test_DE^1.0 Test_AR^1.0 genre^1.0 genre_fr^1.0 
> > > french2^1.0&#x0D;&#x0A;
> > > >                         </str>
> > > >                         <str name="df">text</str>
> > > >                         <str name="q.alt">*:*</str>
> > > >                         <str name="rows">10</str>
> > > >                         <str name="fl">*,score</str>
> > > >                         <str name="pf">
> > > >                                 &#x09;text^0.2 recordID^10.0
> > > annotations^0.6 collectionTitle^2.0 collectionDescription^1.0
> > > title^2.1
> > > Test_FR^1.1 Test_DE^1.1 Test_AR^1.1 genre^1.1 genre_fr^1.1 
> > > french2^1.1&#x0D;&#x0A;</str>
> > > >                         <str name="bf" />
> > > >                         <str name="mm">&#x0D;&#x0A;       0&lt;1
> 2&lt;-1
> > > 5&lt;-2 6&lt;90%&#x0D;&#x0A;      </str>
> > > >                         <int name="ps">100</int>
> > > >                         <!--SpellChecking -->
> > > >                         <str name="df">text</str>
> > > >                         <!-- Solr will use suggestions from both 
> > > > the
> > > 'default' spellchecker
> > > >      and from the 'wordbreak' spellchecker and combine them.
> > > >      collations (re-written queries) can include a combination of
> > > >      corrections from both spellcheckers -->
> > > >                         <str
> name="spellcheck.dictionary">default</str>
> > > >                         <str
> name="spellcheck.dictionary">wordbreak</str>
> > > >                         <str name="spellcheck">on</str>
> > > >                         <str
> name="spellcheck.extendedResults">true</str>
> > > >                         <str name="spellcheck.count">10</str>
> > > >                         <str
> > > name="spellcheck.alternativeTermCount">5</str>
> > > >                         <str
> > > name="spellcheck.maxResultsForSuggest">5</str>
> > > >                         <str name="spellcheck.collate">true</str>
> > > >                         <str
> > > name="spellcheck.collateExtendedResults">true</str>
> > > >                         <str name="spellcheck.maxCollations">5</str>
> > > >                 </lst>
> > > >                 <arr name="last-components">
> > > >                         <str>spellcheck</str>
> > > >                 </arr>
> > > >                 <!-- In addition to defaults, "appends" params 
> > > > can be
> > > specified
> > > >          to identify values which should be appended to the list of
> > > >          multi-val params from the query (or the existing
> "defaults").
> > > >       -->
> > > >         </requestHandler>
> > > >
> > > >         <requestHandler name="/query" class="solr.SearchHandler">
> > > >                 <lst name="defaults">
> > > >                         <str name="echoParams">explicit</str>
> > > >                         <str name="wt">json</str>
> > > >                         <str name="indent">true</str>
> > > >                         <str name="df">text</str>
> > > >                 </lst>
> > > >         </requestHandler>
> > > >
> > > > Is there anything else that might be useful in helping diagnose 
> > > > what's
> > > going wrong for me?
> > > >
> > > > Cheers,
> > > > Claire.
> > > >
> > > > -----Original Message-----
> > > > From: Saurabh Sharma <saurabh.infoe...@gmail.com>
> > > > Sent: 06 January 2020 11:20
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Edismax ignoring queries containing booleans
> > > >
> > > > It should work well. I have just tested the same with 8.3.0.
> > > >
> > > > Thanks
> > > > Saurabh Sharma
> > > >
> > > > On Mon, Jan 6, 2020, 4:31 PM Claire Pollard 
> > > > <claire.poll...@imagen.io>
> > > > wrote:
> > > >
> > > > > I'm using:
> > > > >
> > > > > recordID:(18 OR 19 OR 20)
> > > > >
> > > > > Which should return 2 records (as 18 doesn't exist), but it 
> > > > > returns
> > > none.
> > > > > recordID is a LongPointField (sorry I said Int in my previous
> message).
> > > > >
> > > > > -----Original Message-----
> > > > > From: Saurabh Sharma <saurabh.infoe...@gmail.com>
> > > > > Sent: 06 January 2020 10:35
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: Edismax ignoring queries containing booleans
> > > > >
> > > > > Please share the query which you are creating.
> > > > >
> > > > > On Mon, Jan 6, 2020, 3:52 PM Claire Pollard 
> > > > > <claire.poll...@imagen.io>
> > > > > wrote:
> > > > >
> > > > > > In Solr 8.3.0 I've got an edismax query parser in my search 
> > > > > > handler, and it seems to be ignoring Boolean operators such 
> > > > > > as AND and OR when searching using an IntPointField.
> > > > > >
> > > > > > I was hoping to use a query to this field to return a batch 
> > > > > > of documents with non-sequential IDs, so a range would be
> inappropriate.
> > > > > >
> > > > > > We had a previous 4.10.2 instance of Solr which uses the now 
> > > > > > deprecated Trie fields, and these seem to search without 
> > > > > > issue using
> > > > > boolean operators.
> > > > > >
> > > > > > Is there something extra I need to do with my setup for 
> > > > > > PointFields to use booleans or should they work as default.
> > > > > >
> > > > > > Cheers,
> > > > > > Claire.
> > > > > >
> > > > >
> > >
> > >
>

RE: Edismax ignoring queries containing booleans

Reply via email to