Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

David Hastings Fri, 08 Nov 2019 08:43:58 -0800

is your default operator OR?
change it to AND


On Fri, Nov 8, 2019 at 11:30 AM Guilherme Viteri <gvit...@ebi.ac.uk> wrote:

> HI Walter and Paras
>
> I indexed it removing all the references to StopWordFilter and I went from
> 121 results to near 20K as the search term q="Lymphoid and a non-Lymphoid
> cell" is matching entities such as "IFT A" or  "Lamin A". So I don't think
> removing it completely is the way to go from the scenario we have, but I
> appreciate the suggestion...
>
> Yes the response is using fl=*
> I am trying some combinations at the moment, but yet no success.
>
> defType=edismax
> q.alt=Lymphoid and a non-Lymphoid cell
> Number of results=1599
> Quite a considerable increase, even though reasonable meaningful results.
>
> I am sorry but I didn't understand what do you want me to do exactly with
> the lst (??) and qf and bf.
>
> Thanks everyone with their inputs
>
>
> > On 8 Nov 2019, at 06:45, Paras Lehana <paras.leh...@indiamart.com>
> wrote:
> >
> > Hi Guilherme
> >
> > By accident, I ended up querying the using the default handler (/select)
> and it worked.
> >
> > You've just found the culprit. Thanks for giving the material I
> requested. Your analysis chain is working as expected. I don't see any
> issue in either StopWordFilter or your boosts. I also use a boost of 50
> when boosting contextual suggestions (boosting "gold iphone" on a page of
> iphone) but I take Walter's suggestion and would try to optimize my
> weights. I agree that this 50 thing was not researched much about by us as
> well (we never faced performance or relevance issues).
> >
> > See the major difference in both the handlers - edismax. I'm pretty sure
> that your problem lies in the parsing of queries (you can confirm that from
> parsedquery key in debug of both JSON responses). I hope you have provided
> the response with fl=*. Replace q with q.alt in your /search handler query
> and I think you should start getting responses. That's because q.alt uses
> standard parser. If you want to keep using edisMax, I suggest you to test
> the responses removing some combination of lst (qf, bf) and find what's
> restricting the documents to come up. I'm out of office today - would have
> certainly tried analyzing the field values of the document in /select
> request and compare it with qf/bq in solrconfig.xml /search. Do this for me
> and you'd certainly find something.
> >
> > On Thu, 7 Nov 2019 at 21:00, Walter Underwood <wun...@wunderwood.org
> <mailto:wun...@wunderwood.org>> wrote:
> > I normally use a weight of 8 for the most important field, like title.
> Other fields might get a 4 or 2.
> >
> > I add a “pf” field with the weights doubled, so that phrase matches have
> a higher weight.
> >
> > The weight of 8 comes from experience at Infoseek and Inktomi, two early
> web search engines. With different relevance algorithms and totally
> different evaluation and tuning systems, they settled on weights of 8 and
> 7.5 for HTML titles. With the the two radically different system getting
> the same number, I decided that was a property of the documents, not of the
> search engines.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> > http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my
> blog)
> >
> >> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gvit...@ebi.ac.uk
> <mailto:gvit...@ebi.ac.uk>> wrote:
> >>
> >> Hi Wunder,
> >>
> >> My indexer takes quite a few hours to be executed I am shortening it to
> run faster, but I also need to make sure it gives what we are expecting.
> This implementation's been there for >4y, and massively used.
> >>
> >>> In your edismax handlers, weights of 20, 50, and 100 are extremely
> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years
> of configuring Solr.
> >> I've inherited that implementation and I am really keen to adequate it,
> what would you recommend ?
> >>
> >> Cheers
> >> Guilherme
> >>
> >>> On 7 Nov 2019, at 14:43, Walter Underwood <wun...@wunderwood.org
> <mailto:wun...@wunderwood.org>> wrote:
> >>>
> >>> Thanks for posting the files. Looking at schema.xml, I see that you
> still are using StopFilterFactory. The first advice we gave you was to
> remove that.
> >>>
> >>> Remove StopFilterFactory everywhere and reindex.
> >>>
> >>> You will continue to have problems matching stopwords until you do
> that.
> >>>
> >>> In your edismax handlers, weights of 20, 50, and 100 are extremely
> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years
> of configuring Solr.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> >>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>
> (my blog)
> >>>
> >>>> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk
> <mailto:gvit...@ebi.ac.uk>> wrote:
> >>>>
> >>>> Hi Paras, everyone
> >>>>
> >>>> Thank you again for your inputs and suggestions. I sorry to hear you
> had trouble with the attachments I will host it somewhere and share the
> links.
> >>>> I don't tweak my index, I get the data from the graph database,
> create a document as they are and save to solr.
> >>>>
> >>>> So, I am sending the new analysis screen querying the way you
> suggested. Also the results with params and solr query url.
> >>>>
> >>>> During the process of querying what you asked I found something
> really weird (at least for me). By accident, I ended up querying the using
> the default handler (/select) and it worked. Then If I use the one I must
> use, then sadly doesn't work. I am posting both results and I will also
> post the handlers as well.
> >>>>
> >>>> Here is the link with all the files mentioned before
> >>>>
> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0
> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0>
> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0
> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0
> >>
> >>>> If the link doesn't work www dot dropbox dot com slash sh slash
> fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0
> >>>>
> >>>> Thanks
> >>>>
> >>>>> On 7 Nov 2019, at 05:23, Paras Lehana <paras.leh...@indiamart.com
> <mailto:paras.leh...@indiamart.com>> wrote:
> >>>>>
> >>>>> Hi Guilherme.
> >>>>>
> >>>>> I am sending they analysis result and the json result as requested.
> >>>>>
> >>>>>
> >>>>> Thanks for the effort. Luckily, I can see your attachments (low
> quality
> >>>>> though).
> >>>>>
> >>>>> From the analysis screen, the analysis is working as expected. One
> of the
> >>>>> reasons for query="lymphoid and *a* non-lymphoid cell" not matching
> >>>>> document containing "Lymphoid and a non-Lymphoid cell" I can
> initially
> >>>>> think of is: the stopword "a" is probably present in post-analysis
> either
> >>>>> of query or index. Did you tweak your index time analysis after
> indexing?
> >>>>>
> >>>>> Do two things:
> >>>>>
> >>>>> 1. Post the analysis screen for and index=*"Immunoregulatory
> >>>>> interactions between a Lymphoid and a non-Lymphoid cell"* and
> >>>>> "query=*"lymphoid
> >>>>> and a non-lymphoid cell"*. Try hosting the image and providing the
> link
> >>>>> here.
> >>>>> 2. Give the same JSON output as you have sent but this time with
> >>>>> *"echoParams=all"*. Also, post the exact Solr query url.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, 6 Nov 2019 at 21:07, Erick Erickson <erickerick...@gmail.com
> <mailto:erickerick...@gmail.com>> wrote:
> >>>>>
> >>>>>> I don’t see the attachments, maybe I deleted old e-mails or some
> such. The
> >>>>>> Apache server is fairly aggressive about stripping attachments
> though, so
> >>>>>> it’s also possible they didn’t make it through.
> >>>>>>
> >>>>>>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri <gvit...@ebi.ac.uk
> <mailto:gvit...@ebi.ac.uk>> wrote:
> >>>>>>>
> >>>>>>> Thanks Erick.
> >>>>>>>
> >>>>>>>> First, your index and analysis chains are considerably different,
> this
> >>>>>> can easily be a source of problems. In particular, using two
> different
> >>>>>> tokenizers is a huge red flag. I _strongly_ recommend against this
> unless
> >>>>>> you’re totally sure you understand the consequences. Additionally,
> your use
> >>>>>> of the length filter is suspicious, especially since your problem
> statement
> >>>>>> is about the addition of a single letter term and the min length
> allowed on
> >>>>>> that filter is 2. That said, it’s reasonable to suppose that the
> ’a’ is
> >>>>>> filtered out in both cases, but maybe you’ve found something odd
> about the
> >>>>>> interactions.
> >>>>>>> I will investigate the min length and post the results later.
> >>>>>>>
> >>>>>>>> Second, I have no idea what this will do. Are the equal signs
> typos?
> >>>>>> Used by custom code?
> >>>>>>> This the url in my application, not solr params. That's the query
> string.
> >>>>>>>
> >>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely
> that
> >>>>>> all the params with an equal-sign are totally ignored unless it’s
> just a
> >>>>>> typo.
> >>>>>>> This is part of the application. Species will be used later on in
> solr
> >>>>>> to filter out the result. That's not solr. That my app params.
> >>>>>>>
> >>>>>>>> Third, the easiest way to see what’s happening under the covers
> is to
> >>>>>> add “&debug=true” to the query and look at the parsed query. Ignore
> all the
> >>>>>> relevance calculations for the nonce, or specify “&debug=query” to
> skip
> >>>>>> that part.
> >>>>>>> The two json files i've sent, they are debugQuery=on and the
> explain tag
> >>>>>> is present.
> >>>>>>> I will try the searching the way you mentioned.
> >>>>>>>
> >>>>>>> Thank for your inputs
> >>>>>>>
> >>>>>>> Guilherme
> >>>>>>>
> >>>>>>>> On 6 Nov 2019, at 14:14, Erick Erickson <erickerick...@gmail.com
> <mailto:erickerick...@gmail.com>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Fwd to another server
> >>>>>>>>
> >>>>>>>> First, your index and analysis chains are considerably different,
> this
> >>>>>> can easily be a source of problems. In particular, using two
> different
> >>>>>> tokenizers is a huge red flag. I _strongly_ recommend against this
> unless
> >>>>>> you’re totally sure you understand the consequences. Additionally,
> your use
> >>>>>> of the length filter is suspicious, especially since your problem
> statement
> >>>>>> is about the addition of a single letter term and the min length
> allowed on
> >>>>>> that filter is 2. That said, it’s reasonable to suppose that the
> ’a’ is
> >>>>>> filtered out in both cases, but maybe you’ve found something odd
> about the
> >>>>>> interactions.
> >>>>>>>>
> >>>>>>>> Second, I have no idea what this will do. Are the equal signs
> typos?
> >>>>>> Used by custom code?
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
> >>>>>>>>
> >>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely
> that
> >>>>>> all the params with an equal-sign are totally ignored unless it’s
> just a
> >>>>>> typo.
> >>>>>>>>
> >>>>>>>> Third, the easiest way to see what’s happening under the covers
> is to
> >>>>>> add “&debug=true” to the query and look at the parsed query. Ignore
> all the
> >>>>>> relevance calculations for the nonce, or specify “&debug=query” to
> skip
> >>>>>> that part.
> >>>>>>>>
> >>>>>>>> 90% + of the time, the question “why didn’t this query do what I
> >>>>>> expect” is answered by looking at the “&debug=query” output and the
> >>>>>> analysis page in the admin UI. NOTE: for the analysis page be sure
> to look
> >>>>>> at _both_ the query and index output. Also, and very important
> about the
> >>>>>> analysis page (and this is confusing) is that this _assumes_ that
> what you
> >>>>>> put in the text boxes have made it through the query parser intact
> and is
> >>>>>> analyzed by the field selected. Consider the search "q=field:word1
> word2".
> >>>>>> Now you type “word1 word2” into the analysis text box and it looks
> like
> >>>>>> what you expect. That’s misleading because the query is _parsed_ as
> >>>>>> "field:word1 default_search_field:word2”. This is where
> “&debug=query”
> >>>>>> helps.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Erick
> >>>>>>>>
> >>>>>>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana <
> paras.leh...@indiamart.com <mailto:paras.leh...@indiamart.com>>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Walter,
> >>>>>>>>>
> >>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. Those
> words
> >>>>>> will
> >>>>>>>>>> not be in the index, so they can never match a query.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I think the OP's concern is different results when adding a
> stopword. I
> >>>>>>>>> think he's using the filter factory correctly - the query chain
> >>>>>> includes
> >>>>>>>>> the filter as well so it should remove "a" while querying.
> >>>>>>>>>
> >>>>>>>>> *@Guilherme*, please post results for both the query, the
> document in
> >>>>>>>>> result you are concerned about and post full result of analysis
> screen
> >>>>>> (for
> >>>>>>>>> both query and index).
> >>>>>>>>>
> >>>>>>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood <
> wun...@wunderwood.org <mailto:wun...@wunderwood.org>>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> No.
> >>>>>>>>>>
> >>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords.
> Those words
> >>>>>>>>>> will not be in the index, so they can never match a query.
> >>>>>>>>>>
> >>>>>>>>>> 1. Remove the lines with solr.StopFilter from every analysis
> chain in
> >>>>>>>>>> schema.xml.
> >>>>>>>>>> 2. Reload the collection, restart Solr, or whatever to read the
> new
> >>>>>> config.
> >>>>>>>>>> 3. Reindex all of the documents.
> >>>>>>>>>>
> >>>>>>>>>> When indexed with the new analysis chain, the stopwords will
> not be
> >>>>>>>>>> removed and they will be searchable.
> >>>>>>>>>>
> >>>>>>>>>> wunder
> >>>>>>>>>> Walter Underwood
> >>>>>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> >>>>>>>>>> http://observer.wunderwood.org/ <
> http://observer.wunderwood.org/>  (my blog)
> >>>>>>>>>>
> >>>>>>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri <
> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>
> >>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Ok. I am kind a lost now.
> >>>>>>>>>>> If I open up the console > analysis and perform it, that's the
> final
> >>>>>>>>>> result.
> >>>>>>>>>>> <Screenshot 2019-11-05 at 14.54.16.png>
> >>>>>>>>>>>
> >>>>>>>>>>> Your suggestion is: get rid of the <filter stopword.txt> in the
> >>>>>>>>>> schema.xml and during index phase replaceAll("in
> stopwords.txt"," ")
> >>>>>> then
> >>>>>>>>>> add to solr. Is that correct ?
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks David
> >>>>>>>>>>>
> >>>>>>>>>>>> On 5 Nov 2019, at 14:48, David Hastings <
> >>>>>> hastings.recurs...@gmail.com <mailto:hastings.recurs...@gmail.com>
> >>>>>>>>>> <mailto:hastings.recurs...@gmail.com <mailto:
> hastings.recurs...@gmail.com>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Fwd to another server
> >>>>>>>>>>>>
> >>>>>>>>>>>> no,
> >>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> >>>>>>>>>>>> words="stopwords.txt"/>
> >>>>>>>>>>>>
> >>>>>>>>>>>> is still using stopwords and should be removed, in my opinion
> of
> >>>>>> course,
> >>>>>>>>>>>> based on your use case may be different, but i generally axe
> any
> >>>>>>>>>> reference
> >>>>>>>>>>>> to them at all
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri <
> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>
> >>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks.
> >>>>>>>>>>>>> Haven't I done this here ?
> >>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField"
> >>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" >
> >>>>>>>>>>>>>    <analyzer type="index">
> >>>>>>>>>>>>>        <tokenizer class="solr.StandardTokenizerFactory"/>
> >>>>>>>>>>>>>        <filter class="solr.ClassicFilterFactory"/>
> >>>>>>>>>>>>>        <filter class="solr.LengthFilterFactory" min="2"
> >>>>>>>>>> max="20"/>
> >>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
> >>>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> >>>>>>>>>>>>> words="stopwords.txt"/>
> >>>>>>>>>>>>>    </analyzer>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 5 Nov 2019, at 14:15, David Hastings <
> >>>>>> hastings.recurs...@gmail.com <mailto:hastings.recurs...@gmail.com>
> >>>>>>>>>> <mailto:hastings.recurs...@gmail.com <mailto:
> hastings.recurs...@gmail.com>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Fwd to another server
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The first thing you should do is remove any reference to
> stop
> >>>>>> words
> >>>>>>>>>> and
> >>>>>>>>>>>>>> never use them, then re-index your data and try it again.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri <
> >>>>>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>
> >>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I am performing a search to match a name (text_field),
> however
> >>>>>> this
> >>>>>>>>>> term
> >>>>>>>>>>>>>>> contains 'and' and 'a' and it doesn't return any records.
> If i
> >>>>>> remove
> >>>>>>>>>>>>> 'a'
> >>>>>>>>>>>>>>> then it works.
> >>>>>>>>>>>>>>> e.g
> >>>>>>>>>>>>>>> Search Term: lymphoid and a non-lymphoid cell
> >>>>>>>>>>>>>>> doesn't work:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
> >>>>>>>>>> <
> >>>>>>>>>>
> >>>>>>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
> >>>>>>>>>>>
> >>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Search term: lymphoid and non-lymphoid cell
> >>>>>>>>>>>>>>> works:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
> >>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> interested in the first result
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> schema.xml
> >>>>>>>>>>>>>>> <field name="name"
> type="text_field"
> >>>>>>>>>>>>>>> indexed="true"  stored="true"   omitNorms="false"
> >>>>>> required="true"
> >>>>>>>>>>>>>>> multiValued="false"/>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>    <analyzer type="query">
> >>>>>>>>>>>>>>>        <tokenizer class="solr.PatternTokenizerFactory"
> >>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/>
> >>>>>>>>>>>>>>>        <filter class="solr.PatternReplaceFilterFactory"
> >>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/>
> >>>>>>>>>>>>>>>        <filter class="solr.PatternReplaceFilterFactory"
> >>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/>
> >>>>>>>>>>>>>>>        <filter class="solr.PatternReplaceFilterFactory"
> >>>>>>>>>>>>>>> pattern="[_]" replacement=" "/>
> >>>>>>>>>>>>>>>        <filter class="solr.LengthFilterFactory" min="2"
> >>>>>>>>>>>>> max="20"/>
> >>>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
> >>>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
> >>>>>>>>>> ignoreCase="true"
> >>>>>>>>>>>>>>> words="stopwords.txt"/>
> >>>>>>>>>>>>>>>    </analyzer>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField"
> >>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" >
> >>>>>>>>>>>>>>>    <analyzer type="index">
> >>>>>>>>>>>>>>>        <tokenizer class="solr.StandardTokenizerFactory"/>
> >>>>>>>>>>>>>>>        <filter class="solr.ClassicFilterFactory"/>
> >>>>>>>>>>>>>>>        <filter class="solr.LengthFilterFactory" min="2"
> >>>>>>>>>>>>> max="20"/>
> >>>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
> >>>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
> >>>>>>>>>> ignoreCase="true"
> >>>>>>>>>>>>>>> words="stopwords.txt"/>
> >>>>>>>>>>>>>>>    </analyzer>
> >>>>>>>>>>>>>>>    <analyzer type="query">
> >>>>>>>>>>>>>>>        <tokenizer class="solr.PatternTokenizerFactory"
> >>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/>
> >>>>>>>>>>>>>>>        <filter class="solr.PatternReplaceFilterFactory"
> >>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/>
> >>>>>>>>>>>>>>>        <filter class="solr.PatternReplaceFilterFactory"
> >>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/>
> >>>>>>>>>>>>>>>        <filter class="solr.PatternReplaceFilterFactory"
> >>>>>>>>>>>>>>> pattern="[_]" replacement=" "/>
> >>>>>>>>>>>>>>>        <filter class="solr.LengthFilterFactory" min="2"
> >>>>>>>>>>>>> max="20"/>
> >>>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
> >>>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
> >>>>>>>>>> ignoreCase="true"
> >>>>>>>>>>>>>>> words="stopwords.txt"/>
> >>>>>>>>>>>>>>>    </analyzer>
> >>>>>>>>>>>>>>> </fieldType>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> stopwords.txt
> >>>>>>>>>>>>>>> #Standard english stop words taken from Lucene's
> StopAnalyzer
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>> b
> >>>>>>>>>>>>>>> c
> >>>>>>>>>>>>>>> ....
> >>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Running SolR 6.6.2.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Is there anything I could do to prevent this ?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>>> Guilherme
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> --
> >>>>>>>>> Regards,
> >>>>>>>>>
> >>>>>>>>> *Paras Lehana* [65871]
> >>>>>>>>> Development Engineer, Auto-Suggest,
> >>>>>>>>> IndiaMART Intermesh Ltd.
> >>>>>>>>>
> >>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> >>>>>>>>> Noida, UP, IN - 201303
> >>>>>>>>>
> >>>>>>>>> Mob.: +91-9560911996
> >>>>>>>>> Work: 01203916600 | Extn:  *8173*
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> IMPORTANT:
> >>>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone.
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> --
> >>>>> Regards,
> >>>>>
> >>>>> *Paras Lehana* [65871]
> >>>>> Development Engineer, Auto-Suggest,
> >>>>> IndiaMART Intermesh Ltd.
> >>>>>
> >>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> >>>>> Noida, UP, IN - 201303
> >>>>>
> >>>>> Mob.: +91-9560911996
> >>>>> Work: 01203916600 | Extn:  *8173*
> >>>>>
> >>>>> --
> >>>>> IMPORTANT:
> >>>>> NEVER share your IndiaMART OTP/ Password with anyone.
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > --
> > Regards,
> >
> > Paras Lehana [65871]
> > Development Engineer, Auto-Suggest,
> > IndiaMART Intermesh Ltd.
> >
> > 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> > Noida, UP, IN - 201303
> >
> > Mob.: +91-9560911996 <tel:+91-9560911996>
> > Work: 01203916600 | Extn:  8173
> >
> > IMPORTANT:
> > NEVER share your IndiaMART OTP/ Password with anyone.
>
>

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

Reply via email to