If we had IDF for phrases, they would be super effective. The 2X weight is a hack that mostly works.
Infoseek had phrase IDF and it was a killer algorithm for relevance. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 8, 2019, at 11:08 AM, David Hastings <hastings.recurs...@gmail.com> > wrote: > > the pf and qf fields are REALLY nice for this > > On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood <wun...@wunderwood.org> > wrote: > >> I always enable phrase searching in edismax for exactly this reason. >> >> Something like: >> >> <str name="qf”>title^8 keywords^4 text</str> >> <str name="pf”>title^16 keywords^8 text^2</str> >> >> To deal with concepts in queries, a classifier and/or named entity >> extractor can be helpful. If you have a list of concepts (“controlled >> vocabulary”) that includes “Lamin A”, and that shows up in a query, that >> term can be queried against the field matching that vocabulary. >> >> This is how LinkedIn separates people, companies, and places, for example. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Nov 8, 2019, at 10:48 AM, Erick Erickson <erickerick...@gmail.com> >> wrote: >>> >>> Look at the “mm” parameter, try setting it to 100%. Although that’t not >> entirely likely to do what you want either since virtually every doc will >> have “a” in it. But at least you’d get docs that have both terms. >>> >>> you may also be able to search for things like “Lamin A” _only as a >> phrase_ and have some luck. But this is a gnarly problem in general. Some >> people have been able to substitute synonyms and/or shingles to make this >> work at the expense of a larger index. >>> >>> This is a generic problem with context. “Lamin A” is really a “concept”, >> not just two words that happen to be near each other. Searching as a phrase >> is an OOB-but-naive way to try to make it more likely that the ranked >> results refer to the _concept_ of “Lamin A”. The assumption here is “if >> these two words appear next to each other, they’re more likely to be what I >> want”. I say “naive” because “Lamins: A new approach to...” would _also_ be >> found for a naive phrase search. (I have no idea whether such a title makes >> sense or not, but you figured that out already)... >>> >>> To do this well you’d have to dive in to NLP/Machine learning. >>> >>> I truly wish we could have the DWIM search algorithm (Do What I Mean)…. >>> >>>> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri <gvit...@ebi.ac.uk> >> wrote: >>>> >>>> HI Walter and Paras >>>> >>>> I indexed it removing all the references to StopWordFilter and I went >> from 121 results to near 20K as the search term q="Lymphoid and a >> non-Lymphoid cell" is matching entities such as "IFT A" or "Lamin A". So I >> don't think removing it completely is the way to go from the scenario we >> have, but I appreciate the suggestion… >>>> >>>> Yes the response is using fl=* >>>> I am trying some combinations at the moment, but yet no success. >>>> >>>> defType=edismax >>>> q.alt=Lymphoid and a non-Lymphoid cell >>>> Number of results=1599 >>>> Quite a considerable increase, even though reasonable meaningful >> results. >>>> >>>> I am sorry but I didn't understand what do you want me to do exactly >> with the lst (??) and qf and bf. >>>> >>>> Thanks everyone with their inputs >>>> >>>> >>>>> On 8 Nov 2019, at 06:45, Paras Lehana <paras.leh...@indiamart.com> >> wrote: >>>>> >>>>> Hi Guilherme >>>>> >>>>> By accident, I ended up querying the using the default handler >> (/select) and it worked. >>>>> >>>>> You've just found the culprit. Thanks for giving the material I >> requested. Your analysis chain is working as expected. I don't see any >> issue in either StopWordFilter or your boosts. I also use a boost of 50 >> when boosting contextual suggestions (boosting "gold iphone" on a page of >> iphone) but I take Walter's suggestion and would try to optimize my >> weights. I agree that this 50 thing was not researched much about by us as >> well (we never faced performance or relevance issues). >>>>> >>>>> See the major difference in both the handlers - edismax. I'm pretty >> sure that your problem lies in the parsing of queries (you can confirm that >> from parsedquery key in debug of both JSON responses). I hope you have >> provided the response with fl=*. Replace q with q.alt in your /search >> handler query and I think you should start getting responses. That's >> because q.alt uses standard parser. If you want to keep using edisMax, I >> suggest you to test the responses removing some combination of lst (qf, bf) >> and find what's restricting the documents to come up. I'm out of office >> today - would have certainly tried analyzing the field values of the >> document in /select request and compare it with qf/bq in solrconfig.xml >> /search. Do this for me and you'd certainly find something. >>>>> >>>>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood <wun...@wunderwood.org >> <mailto:wun...@wunderwood.org>> wrote: >>>>> I normally use a weight of 8 for the most important field, like title. >> Other fields might get a 4 or 2. >>>>> >>>>> I add a “pf” field with the weights doubled, so that phrase matches >> have a higher weight. >>>>> >>>>> The weight of 8 comes from experience at Infoseek and Inktomi, two >> early web search engines. With different relevance algorithms and totally >> different evaluation and tuning systems, they settled on weights of 8 and >> 7.5 for HTML titles. With the the two radically different system getting >> the same number, I decided that was a property of the documents, not of the >> search engines. >>>>> >>>>> wunder >>>>> Walter Underwood >>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> >> (my blog) >>>>> >>>>>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gvit...@ebi.ac.uk >> <mailto:gvit...@ebi.ac.uk>> wrote: >>>>>> >>>>>> Hi Wunder, >>>>>> >>>>>> My indexer takes quite a few hours to be executed I am shortening it >> to run faster, but I also need to make sure it gives what we are expecting. >> This implementation's been there for >4y, and massively used. >>>>>> >>>>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely >> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years >> of configuring Solr. >>>>>> I've inherited that implementation and I am really keen to adequate >> it, what would you recommend ? >>>>>> >>>>>> Cheers >>>>>> Guilherme >>>>>> >>>>>>> On 7 Nov 2019, at 14:43, Walter Underwood <wun...@wunderwood.org >> <mailto:wun...@wunderwood.org>> wrote: >>>>>>> >>>>>>> Thanks for posting the files. Looking at schema.xml, I see that you >> still are using StopFilterFactory. The first advice we gave you was to >> remove that. >>>>>>> >>>>>>> Remove StopFilterFactory everywhere and reindex. >>>>>>> >>>>>>> You will continue to have problems matching stopwords until you do >> that. >>>>>>> >>>>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely >> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years >> of configuring Solr. >>>>>>> >>>>>>> wunder >>>>>>> Walter Underwood >>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >>>>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> >> (my blog) >>>>>>> >>>>>>>> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk >> <mailto:gvit...@ebi.ac.uk>> wrote: >>>>>>>> >>>>>>>> Hi Paras, everyone >>>>>>>> >>>>>>>> Thank you again for your inputs and suggestions. I sorry to hear >> you had trouble with the attachments I will host it somewhere and share the >> links. >>>>>>>> I don't tweak my index, I get the data from the graph database, >> create a document as they are and save to solr. >>>>>>>> >>>>>>>> So, I am sending the new analysis screen querying the way you >> suggested. Also the results with params and solr query url. >>>>>>>> >>>>>>>> During the process of querying what you asked I found something >> really weird (at least for me). By accident, I ended up querying the using >> the default handler (/select) and it worked. Then If I use the one I must >> use, then sadly doesn't work. I am posting both results and I will also >> post the handlers as well. >>>>>>>> >>>>>>>> Here is the link with all the files mentioned before >>>>>>>> >> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0< >> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0> >> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 >> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 >>>> >>>>>>>> If the link doesn't work www dot dropbox dot com slash sh slash >> fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0 >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>>> On 7 Nov 2019, at 05:23, Paras Lehana <paras.leh...@indiamart.com >> <mailto:paras.leh...@indiamart.com>> wrote: >>>>>>>>> >>>>>>>>> Hi Guilherme. >>>>>>>>> >>>>>>>>> I am sending they analysis result and the json result as requested. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for the effort. Luckily, I can see your attachments (low >> quality >>>>>>>>> though). >>>>>>>>> >>>>>>>>> From the analysis screen, the analysis is working as expected. One >> of the >>>>>>>>> reasons for query="lymphoid and *a* non-lymphoid cell" not matching >>>>>>>>> document containing "Lymphoid and a non-Lymphoid cell" I can >> initially >>>>>>>>> think of is: the stopword "a" is probably present in post-analysis >> either >>>>>>>>> of query or index. Did you tweak your index time analysis after >> indexing? >>>>>>>>> >>>>>>>>> Do two things: >>>>>>>>> >>>>>>>>> 1. Post the analysis screen for and index=*"Immunoregulatory >>>>>>>>> interactions between a Lymphoid and a non-Lymphoid cell"* and >>>>>>>>> "query=*"lymphoid >>>>>>>>> and a non-lymphoid cell"*. Try hosting the image and providing the >> link >>>>>>>>> here. >>>>>>>>> 2. Give the same JSON output as you have sent but this time with >>>>>>>>> *"echoParams=all"*. Also, post the exact Solr query url. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, 6 Nov 2019 at 21:07, Erick Erickson < >> erickerick...@gmail.com <mailto:erickerick...@gmail.com>> wrote: >>>>>>>>> >>>>>>>>>> I don’t see the attachments, maybe I deleted old e-mails or some >> such. The >>>>>>>>>> Apache server is fairly aggressive about stripping attachments >> though, so >>>>>>>>>> it’s also possible they didn’t make it through. >>>>>>>>>> >>>>>>>>>>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri <gvit...@ebi.ac.uk >> <mailto:gvit...@ebi.ac.uk>> wrote: >>>>>>>>>>> >>>>>>>>>>> Thanks Erick. >>>>>>>>>>> >>>>>>>>>>>> First, your index and analysis chains are considerably >> different, this >>>>>>>>>> can easily be a source of problems. In particular, using two >> different >>>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against >> this unless >>>>>>>>>> you’re totally sure you understand the consequences. >> Additionally, your use >>>>>>>>>> of the length filter is suspicious, especially since your problem >> statement >>>>>>>>>> is about the addition of a single letter term and the min length >> allowed on >>>>>>>>>> that filter is 2. That said, it’s reasonable to suppose that the >> ’a’ is >>>>>>>>>> filtered out in both cases, but maybe you’ve found something odd >> about the >>>>>>>>>> interactions. >>>>>>>>>>> I will investigate the min length and post the results later. >>>>>>>>>>> >>>>>>>>>>>> Second, I have no idea what this will do. Are the equal signs >> typos? >>>>>>>>>> Used by custom code? >>>>>>>>>>> This the url in my application, not solr params. That's the >> query string. >>>>>>>>>>> >>>>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely >> that >>>>>>>>>> all the params with an equal-sign are totally ignored unless it’s >> just a >>>>>>>>>> typo. >>>>>>>>>>> This is part of the application. Species will be used later on >> in solr >>>>>>>>>> to filter out the result. That's not solr. That my app params. >>>>>>>>>>> >>>>>>>>>>>> Third, the easiest way to see what’s happening under the covers >> is to >>>>>>>>>> add “&debug=true” to the query and look at the parsed query. >> Ignore all the >>>>>>>>>> relevance calculations for the nonce, or specify “&debug=query” >> to skip >>>>>>>>>> that part. >>>>>>>>>>> The two json files i've sent, they are debugQuery=on and the >> explain tag >>>>>>>>>> is present. >>>>>>>>>>> I will try the searching the way you mentioned. >>>>>>>>>>> >>>>>>>>>>> Thank for your inputs >>>>>>>>>>> >>>>>>>>>>> Guilherme >>>>>>>>>>> >>>>>>>>>>>> On 6 Nov 2019, at 14:14, Erick Erickson < >> erickerick...@gmail.com <mailto:erickerick...@gmail.com>> >>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Fwd to another server >>>>>>>>>>>> >>>>>>>>>>>> First, your index and analysis chains are considerably >> different, this >>>>>>>>>> can easily be a source of problems. In particular, using two >> different >>>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against >> this unless >>>>>>>>>> you’re totally sure you understand the consequences. >> Additionally, your use >>>>>>>>>> of the length filter is suspicious, especially since your problem >> statement >>>>>>>>>> is about the addition of a single letter term and the min length >> allowed on >>>>>>>>>> that filter is 2. That said, it’s reasonable to suppose that the >> ’a’ is >>>>>>>>>> filtered out in both cases, but maybe you’ve found something odd >> about the >>>>>>>>>> interactions. >>>>>>>>>>>> >>>>>>>>>>>> Second, I have no idea what this will do. Are the equal signs >> typos? >>>>>>>>>> Used by custom code? >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> < >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>>>>>>>>>>> >>>>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely >> that >>>>>>>>>> all the params with an equal-sign are totally ignored unless it’s >> just a >>>>>>>>>> typo. >>>>>>>>>>>> >>>>>>>>>>>> Third, the easiest way to see what’s happening under the covers >> is to >>>>>>>>>> add “&debug=true” to the query and look at the parsed query. >> Ignore all the >>>>>>>>>> relevance calculations for the nonce, or specify “&debug=query” >> to skip >>>>>>>>>> that part. >>>>>>>>>>>> >>>>>>>>>>>> 90% + of the time, the question “why didn’t this query do what I >>>>>>>>>> expect” is answered by looking at the “&debug=query” output and >> the >>>>>>>>>> analysis page in the admin UI. NOTE: for the analysis page be >> sure to look >>>>>>>>>> at _both_ the query and index output. Also, and very important >> about the >>>>>>>>>> analysis page (and this is confusing) is that this _assumes_ that >> what you >>>>>>>>>> put in the text boxes have made it through the query parser >> intact and is >>>>>>>>>> analyzed by the field selected. Consider the search >> "q=field:word1 word2". >>>>>>>>>> Now you type “word1 word2” into the analysis text box and it >> looks like >>>>>>>>>> what you expect. That’s misleading because the query is _parsed_ >> as >>>>>>>>>> "field:word1 default_search_field:word2”. This is where >> “&debug=query” >>>>>>>>>> helps. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Erick >>>>>>>>>>>> >>>>>>>>>>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana < >> paras.leh...@indiamart.com <mailto:paras.leh...@indiamart.com>> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Walter, >>>>>>>>>>>>> >>>>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. >> Those words >>>>>>>>>> will >>>>>>>>>>>>>> not be in the index, so they can never match a query. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I think the OP's concern is different results when adding a >> stopword. I >>>>>>>>>>>>> think he's using the filter factory correctly - the query chain >>>>>>>>>> includes >>>>>>>>>>>>> the filter as well so it should remove "a" while querying. >>>>>>>>>>>>> >>>>>>>>>>>>> *@Guilherme*, please post results for both the query, the >> document in >>>>>>>>>>>>> result you are concerned about and post full result of >> analysis screen >>>>>>>>>> (for >>>>>>>>>>>>> both query and index). >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood < >> wun...@wunderwood.org <mailto:wun...@wunderwood.org>> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> No. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. >> Those words >>>>>>>>>>>>>> will not be in the index, so they can never match a query. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. Remove the lines with solr.StopFilter from every analysis >> chain in >>>>>>>>>>>>>> schema.xml. >>>>>>>>>>>>>> 2. Reload the collection, restart Solr, or whatever to read >> the new >>>>>>>>>> config. >>>>>>>>>>>>>> 3. Reindex all of the documents. >>>>>>>>>>>>>> >>>>>>>>>>>>>> When indexed with the new analysis chain, the stopwords will >> not be >>>>>>>>>>>>>> removed and they will be searchable. >>>>>>>>>>>>>> >>>>>>>>>>>>>> wunder >>>>>>>>>>>>>> Walter Underwood >>>>>>>>>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >>>>>>>>>>>>>> http://observer.wunderwood.org/ < >> http://observer.wunderwood.org/> (my blog) >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri < >> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ok. I am kind a lost now. >>>>>>>>>>>>>>> If I open up the console > analysis and perform it, that's >> the final >>>>>>>>>>>>>> result. >>>>>>>>>>>>>>> <Screenshot 2019-11-05 at 14.54.16.png> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Your suggestion is: get rid of the <filter stopword.txt> in >> the >>>>>>>>>>>>>> schema.xml and during index phase replaceAll("in >> stopwords.txt"," ") >>>>>>>>>> then >>>>>>>>>>>>>> add to solr. Is that correct ? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks David >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:48, David Hastings < >>>>>>>>>> hastings.recurs...@gmail.com <mailto:hastings.recurs...@gmail.com >>> >>>>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com <mailto: >> hastings.recurs...@gmail.com>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Fwd to another server >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> no, >>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" >> ignoreCase="true" >>>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> is still using stopwords and should be removed, in my >> opinion of >>>>>>>>>> course, >>>>>>>>>>>>>>>> based on your use case may be different, but i generally >> axe any >>>>>>>>>>>>>> reference >>>>>>>>>>>>>>>> to them at all >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri < >> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> >>>>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>> Haven't I done this here ? >>>>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" >>>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > >>>>>>>>>>>>>>>>> <analyzer type="index"> >>>>>>>>>>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> >>>>>>>>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> >>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>>>>>>> max="20"/> >>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" >> ignoreCase="true" >>>>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>>>> </analyzer> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:15, David Hastings < >>>>>>>>>> hastings.recurs...@gmail.com <mailto:hastings.recurs...@gmail.com >>> >>>>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com <mailto: >> hastings.recurs...@gmail.com>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Fwd to another server >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The first thing you should do is remove any reference to >> stop >>>>>>>>>> words >>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> never use them, then re-index your data and try it again. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri < >>>>>>>>>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> >>>>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I am performing a search to match a name (text_field), >> however >>>>>>>>>> this >>>>>>>>>>>>>> term >>>>>>>>>>>>>>>>>>> contains 'and' and 'a' and it doesn't return any >> records. If i >>>>>>>>>> remove >>>>>>>>>>>>>>>>> 'a' >>>>>>>>>>>>>>>>>>> then it works. >>>>>>>>>>>>>>>>>>> e.g >>>>>>>>>>>>>>>>>>> Search Term: lymphoid and a non-lymphoid cell >>>>>>>>>>>>>>>>>>> doesn't work: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> < >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>>>>>>>>>>>>> < >>>>>>>>>>>>>> >>>>>>>>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> < >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> < >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Search term: lymphoid and non-lymphoid cell >>>>>>>>>>>>>>>>>>> works: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> < >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >> < >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> interested in the first result >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> schema.xml >>>>>>>>>>>>>>>>>>> <field name="name" >> type="text_field" >>>>>>>>>>>>>>>>>>> indexed="true" stored="true" omitNorms="false" >>>>>>>>>> required="true" >>>>>>>>>>>>>>>>>>> multiValued="false"/> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> <analyzer type="query"> >>>>>>>>>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" >>>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/> >>>>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>>>>>>>>>> max="20"/> >>>>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>>>>>>>>>>> ignoreCase="true" >>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>>>>>> </analyzer> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" >>>>>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > >>>>>>>>>>>>>>>>>>> <analyzer type="index"> >>>>>>>>>>>>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> >>>>>>>>>>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> >>>>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>>>>>>>>>> max="20"/> >>>>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>>>>>>>>>>> ignoreCase="true" >>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>>>>>> </analyzer> >>>>>>>>>>>>>>>>>>> <analyzer type="query"> >>>>>>>>>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" >>>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/> >>>>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>>>>>>>>>> max="20"/> >>>>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>>>>>>>>>>> ignoreCase="true" >>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>>>>>> </analyzer> >>>>>>>>>>>>>>>>>>> </fieldType> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> stopwords.txt >>>>>>>>>>>>>>>>>>> #Standard english stop words taken from Lucene's >> StopAnalyzer >>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>> b >>>>>>>>>>>>>>>>>>> c >>>>>>>>>>>>>>>>>>> .... >>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Running SolR 6.6.2. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is there anything I could do to prevent this ? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>> Guilherme >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> >>>>>>>>>>>>> *Paras Lehana* [65871] >>>>>>>>>>>>> Development Engineer, Auto-Suggest, >>>>>>>>>>>>> IndiaMART Intermesh Ltd. >>>>>>>>>>>>> >>>>>>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>>>>>>>>>>>> Noida, UP, IN - 201303 >>>>>>>>>>>>> >>>>>>>>>>>>> Mob.: +91-9560911996 >>>>>>>>>>>>> Work: 01203916600 | Extn: *8173* >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> IMPORTANT: >>>>>>>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> *Paras Lehana* [65871] >>>>>>>>> Development Engineer, Auto-Suggest, >>>>>>>>> IndiaMART Intermesh Ltd. >>>>>>>>> >>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>>>>>>>> Noida, UP, IN - 201303 >>>>>>>>> >>>>>>>>> Mob.: +91-9560911996 >>>>>>>>> Work: 01203916600 | Extn: *8173* >>>>>>>>> >>>>>>>>> -- >>>>>>>>> IMPORTANT: >>>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone. >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> Regards, >>>>> >>>>> Paras Lehana [65871] >>>>> Development Engineer, Auto-Suggest, >>>>> IndiaMART Intermesh Ltd. >>>>> >>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>>>> Noida, UP, IN - 201303 >>>>> >>>>> Mob.: +91-9560911996 <tel:+91-9560911996> >>>>> Work: 01203916600 | Extn: 8173 >>>>> >>>>> IMPORTANT: >>>>> NEVER share your IndiaMART OTP/ Password with anyone. >>> >> >>