Removing stopwords is a dumb requirement. “Doctor, it hurts when I shove 
hedgehogs up my arse.”

Part of our job as search engineers is to solve the real problem, not implement 
a pile of requirements from people who don’t understand how search works.

Here is an article I wrote 13 years ago about why we didn’t remove stopwords at 
Netflix.

https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 30, 2020, at 8:56 AM, Permakoff, Vadim <vadim.permak...@verisk.com> 
> wrote:
> 
> Hi Erik,
> That's what I did in the past, but this is an enterprise search and I have a 
> requirement to remove the stopwords.
> To have both features I can add synonyms in the front-end application, I know 
> it will work, but I need a justification why I have to do it in the 
> application as it is an additional effort.
> I thought there is a bug for such case to which I can refer, because 
> according to documentation it should work, right?
> Anyway, there is more to it. If I'll add the same synonym processing to the 
> indexing part, i.e. the configuration will be like this:
> 
>    <fieldType name="text_test" class="solr.TextField" 
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" 
> ignoreCase="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" 
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
> 
> The analysis shows the parsing is matching now for indexing and querying 
> path, but the exact match result still cannot be found! This is weird.
> Any thoughts?
> 
> Best Regards,
> Vadim Permakoff
> 
> 
> -----Original Message-----
> From: Erick Erickson <erickerick...@gmail.com> 
> Sent: Monday, June 29, 2020 10:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query in quotes cannot find results
> 
> Looks like you’re removing stopwords. Stopwords cause issues like this with 
> the positions being off.
> 
> It’s becoming more and more common to _NOT_ remove stopwords, is that an 
> option?
> 
> 
> 
> Best,
> Erick
> 
>> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim <vadim.permak...@verisk.com> 
>> wrote:
>> 
>> Hi Shawn,
>> Many thanks for the response, I checked the field and it is correct. Let's 
>> call it _text_ to make it easier.
>> I believe the parsing is also correct, please see below:
>> - Query without quotes (works):
>>   "querystring":"expand the methods",
>>   "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) 
>> _text_:methods",
>> 
>> - Query with quotes (does not work):
>>   "querystring":"\"expand the methods\"",
>>   "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, 
>> _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",
>> 
>> The document has text:
>> "to expand the methods for mailing cancellation"
>> 
>> The analysis on this field shows that all words are present in the index and 
>> the query, the order is also correct, but the word "methods" in moved one 
>> position, I guess that's why the result is not found.
>> 
>> Best Regards,
>> Vadim Permakoff
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Shawn Heisey <apa...@elyograg.org>
>> Sent: Monday, June 29, 2020 6:28 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Query in quotes cannot find results
>> 
>> On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
>>> The basic query q=expand the methods   <<< finds the document,
>>> the query (in quotes) q="expand the methods"   <<< cannot find the document
>>> 
>>> Am I doing something wrong, or is it known bug (I saw similar issues 
>>> discussed in the past, but not for exact match query) and if yes - what is 
>>> the Jira for it?
>> 
>> The most helpful information will come from running both queries with debug 
>> enabled, so you can see how the query is parsed.  If you add a parameter 
>> "debugQuery=true" to the URL, then the response should include the parsed 
>> query.  Compare those, and see if you can tell what the differences are.
>> 
>> One of the most common problems for queries like this is that you're not 
>> searching the field that you THINK you're searching.  I don't know whether 
>> this is the problem, I just mention it because it is a common error.
>> 
>> Thanks,
>> Shawn
>> 
>> ________________________________
>> 
>> This email is intended solely for the recipient. It may contain privileged, 
>> proprietary or confidential information or material. If you are not the 
>> intended recipient, please delete this email and any attachments and notify 
>> the sender of the error.
> 

Reply via email to