the pf and qf fields are REALLY nice for this On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood <wun...@wunderwood.org> wrote:
> I always enable phrase searching in edismax for exactly this reason. > > Something like: > > <str name="qf”>title^8 keywords^4 text</str> > <str name="pf”>title^16 keywords^8 text^2</str> > > To deal with concepts in queries, a classifier and/or named entity > extractor can be helpful. If you have a list of concepts (“controlled > vocabulary”) that includes “Lamin A”, and that shows up in a query, that > term can be queried against the field matching that vocabulary. > > This is how LinkedIn separates people, companies, and places, for example. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Nov 8, 2019, at 10:48 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > > Look at the “mm” parameter, try setting it to 100%. Although that’t not > entirely likely to do what you want either since virtually every doc will > have “a” in it. But at least you’d get docs that have both terms. > > > > you may also be able to search for things like “Lamin A” _only as a > phrase_ and have some luck. But this is a gnarly problem in general. Some > people have been able to substitute synonyms and/or shingles to make this > work at the expense of a larger index. > > > > This is a generic problem with context. “Lamin A” is really a “concept”, > not just two words that happen to be near each other. Searching as a phrase > is an OOB-but-naive way to try to make it more likely that the ranked > results refer to the _concept_ of “Lamin A”. The assumption here is “if > these two words appear next to each other, they’re more likely to be what I > want”. I say “naive” because “Lamins: A new approach to...” would _also_ be > found for a naive phrase search. (I have no idea whether such a title makes > sense or not, but you figured that out already)... > > > > To do this well you’d have to dive in to NLP/Machine learning. > > > > I truly wish we could have the DWIM search algorithm (Do What I Mean)…. > > > >> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri <gvit...@ebi.ac.uk> > wrote: > >> > >> HI Walter and Paras > >> > >> I indexed it removing all the references to StopWordFilter and I went > from 121 results to near 20K as the search term q="Lymphoid and a > non-Lymphoid cell" is matching entities such as "IFT A" or "Lamin A". So I > don't think removing it completely is the way to go from the scenario we > have, but I appreciate the suggestion… > >> > >> Yes the response is using fl=* > >> I am trying some combinations at the moment, but yet no success. > >> > >> defType=edismax > >> q.alt=Lymphoid and a non-Lymphoid cell > >> Number of results=1599 > >> Quite a considerable increase, even though reasonable meaningful > results. > >> > >> I am sorry but I didn't understand what do you want me to do exactly > with the lst (??) and qf and bf. > >> > >> Thanks everyone with their inputs > >> > >> > >>> On 8 Nov 2019, at 06:45, Paras Lehana <paras.leh...@indiamart.com> > wrote: > >>> > >>> Hi Guilherme > >>> > >>> By accident, I ended up querying the using the default handler > (/select) and it worked. > >>> > >>> You've just found the culprit. Thanks for giving the material I > requested. Your analysis chain is working as expected. I don't see any > issue in either StopWordFilter or your boosts. I also use a boost of 50 > when boosting contextual suggestions (boosting "gold iphone" on a page of > iphone) but I take Walter's suggestion and would try to optimize my > weights. I agree that this 50 thing was not researched much about by us as > well (we never faced performance or relevance issues). > >>> > >>> See the major difference in both the handlers - edismax. I'm pretty > sure that your problem lies in the parsing of queries (you can confirm that > from parsedquery key in debug of both JSON responses). I hope you have > provided the response with fl=*. Replace q with q.alt in your /search > handler query and I think you should start getting responses. That's > because q.alt uses standard parser. If you want to keep using edisMax, I > suggest you to test the responses removing some combination of lst (qf, bf) > and find what's restricting the documents to come up. I'm out of office > today - would have certainly tried analyzing the field values of the > document in /select request and compare it with qf/bq in solrconfig.xml > /search. Do this for me and you'd certainly find something. > >>> > >>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood <wun...@wunderwood.org > <mailto:wun...@wunderwood.org>> wrote: > >>> I normally use a weight of 8 for the most important field, like title. > Other fields might get a 4 or 2. > >>> > >>> I add a “pf” field with the weights doubled, so that phrase matches > have a higher weight. > >>> > >>> The weight of 8 comes from experience at Infoseek and Inktomi, two > early web search engines. With different relevance algorithms and totally > different evaluation and tuning systems, they settled on weights of 8 and > 7.5 for HTML titles. With the the two radically different system getting > the same number, I decided that was a property of the documents, not of the > search engines. > >>> > >>> wunder > >>> Walter Underwood > >>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> > >>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> > (my blog) > >>> > >>>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gvit...@ebi.ac.uk > <mailto:gvit...@ebi.ac.uk>> wrote: > >>>> > >>>> Hi Wunder, > >>>> > >>>> My indexer takes quite a few hours to be executed I am shortening it > to run faster, but I also need to make sure it gives what we are expecting. > This implementation's been there for >4y, and massively used. > >>>> > >>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely > high. I don’t think I’ve ever used a weight higher than 16 in a dozen years > of configuring Solr. > >>>> I've inherited that implementation and I am really keen to adequate > it, what would you recommend ? > >>>> > >>>> Cheers > >>>> Guilherme > >>>> > >>>>> On 7 Nov 2019, at 14:43, Walter Underwood <wun...@wunderwood.org > <mailto:wun...@wunderwood.org>> wrote: > >>>>> > >>>>> Thanks for posting the files. Looking at schema.xml, I see that you > still are using StopFilterFactory. The first advice we gave you was to > remove that. > >>>>> > >>>>> Remove StopFilterFactory everywhere and reindex. > >>>>> > >>>>> You will continue to have problems matching stopwords until you do > that. > >>>>> > >>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely > high. I don’t think I’ve ever used a weight higher than 16 in a dozen years > of configuring Solr. > >>>>> > >>>>> wunder > >>>>> Walter Underwood > >>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> > >>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> > (my blog) > >>>>> > >>>>>> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk > <mailto:gvit...@ebi.ac.uk>> wrote: > >>>>>> > >>>>>> Hi Paras, everyone > >>>>>> > >>>>>> Thank you again for your inputs and suggestions. I sorry to hear > you had trouble with the attachments I will host it somewhere and share the > links. > >>>>>> I don't tweak my index, I get the data from the graph database, > create a document as they are and save to solr. > >>>>>> > >>>>>> So, I am sending the new analysis screen querying the way you > suggested. Also the results with params and solr query url. > >>>>>> > >>>>>> During the process of querying what you asked I found something > really weird (at least for me). By accident, I ended up querying the using > the default handler (/select) and it worked. Then If I use the one I must > use, then sadly doesn't work. I am posting both results and I will also > post the handlers as well. > >>>>>> > >>>>>> Here is the link with all the files mentioned before > >>>>>> > https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0< > https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0> > <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 > <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 > >> > >>>>>> If the link doesn't work www dot dropbox dot com slash sh slash > fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0 > >>>>>> > >>>>>> Thanks > >>>>>> > >>>>>>> On 7 Nov 2019, at 05:23, Paras Lehana <paras.leh...@indiamart.com > <mailto:paras.leh...@indiamart.com>> wrote: > >>>>>>> > >>>>>>> Hi Guilherme. > >>>>>>> > >>>>>>> I am sending they analysis result and the json result as requested. > >>>>>>> > >>>>>>> > >>>>>>> Thanks for the effort. Luckily, I can see your attachments (low > quality > >>>>>>> though). > >>>>>>> > >>>>>>> From the analysis screen, the analysis is working as expected. One > of the > >>>>>>> reasons for query="lymphoid and *a* non-lymphoid cell" not matching > >>>>>>> document containing "Lymphoid and a non-Lymphoid cell" I can > initially > >>>>>>> think of is: the stopword "a" is probably present in post-analysis > either > >>>>>>> of query or index. Did you tweak your index time analysis after > indexing? > >>>>>>> > >>>>>>> Do two things: > >>>>>>> > >>>>>>> 1. Post the analysis screen for and index=*"Immunoregulatory > >>>>>>> interactions between a Lymphoid and a non-Lymphoid cell"* and > >>>>>>> "query=*"lymphoid > >>>>>>> and a non-lymphoid cell"*. Try hosting the image and providing the > link > >>>>>>> here. > >>>>>>> 2. Give the same JSON output as you have sent but this time with > >>>>>>> *"echoParams=all"*. Also, post the exact Solr query url. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Wed, 6 Nov 2019 at 21:07, Erick Erickson < > erickerick...@gmail.com <mailto:erickerick...@gmail.com>> wrote: > >>>>>>> > >>>>>>>> I don’t see the attachments, maybe I deleted old e-mails or some > such. The > >>>>>>>> Apache server is fairly aggressive about stripping attachments > though, so > >>>>>>>> it’s also possible they didn’t make it through. > >>>>>>>> > >>>>>>>>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri <gvit...@ebi.ac.uk > <mailto:gvit...@ebi.ac.uk>> wrote: > >>>>>>>>> > >>>>>>>>> Thanks Erick. > >>>>>>>>> > >>>>>>>>>> First, your index and analysis chains are considerably > different, this > >>>>>>>> can easily be a source of problems. In particular, using two > different > >>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against > this unless > >>>>>>>> you’re totally sure you understand the consequences. > Additionally, your use > >>>>>>>> of the length filter is suspicious, especially since your problem > statement > >>>>>>>> is about the addition of a single letter term and the min length > allowed on > >>>>>>>> that filter is 2. That said, it’s reasonable to suppose that the > ’a’ is > >>>>>>>> filtered out in both cases, but maybe you’ve found something odd > about the > >>>>>>>> interactions. > >>>>>>>>> I will investigate the min length and post the results later. > >>>>>>>>> > >>>>>>>>>> Second, I have no idea what this will do. Are the equal signs > typos? > >>>>>>>> Used by custom code? > >>>>>>>>> This the url in my application, not solr params. That's the > query string. > >>>>>>>>> > >>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely > that > >>>>>>>> all the params with an equal-sign are totally ignored unless it’s > just a > >>>>>>>> typo. > >>>>>>>>> This is part of the application. Species will be used later on > in solr > >>>>>>>> to filter out the result. That's not solr. That my app params. > >>>>>>>>> > >>>>>>>>>> Third, the easiest way to see what’s happening under the covers > is to > >>>>>>>> add “&debug=true” to the query and look at the parsed query. > Ignore all the > >>>>>>>> relevance calculations for the nonce, or specify “&debug=query” > to skip > >>>>>>>> that part. > >>>>>>>>> The two json files i've sent, they are debugQuery=on and the > explain tag > >>>>>>>> is present. > >>>>>>>>> I will try the searching the way you mentioned. > >>>>>>>>> > >>>>>>>>> Thank for your inputs > >>>>>>>>> > >>>>>>>>> Guilherme > >>>>>>>>> > >>>>>>>>>> On 6 Nov 2019, at 14:14, Erick Erickson < > erickerick...@gmail.com <mailto:erickerick...@gmail.com>> > >>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Fwd to another server > >>>>>>>>>> > >>>>>>>>>> First, your index and analysis chains are considerably > different, this > >>>>>>>> can easily be a source of problems. In particular, using two > different > >>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against > this unless > >>>>>>>> you’re totally sure you understand the consequences. > Additionally, your use > >>>>>>>> of the length filter is suspicious, especially since your problem > statement > >>>>>>>> is about the addition of a single letter term and the min length > allowed on > >>>>>>>> that filter is 2. That said, it’s reasonable to suppose that the > ’a’ is > >>>>>>>> filtered out in both cases, but maybe you’ve found something odd > about the > >>>>>>>> interactions. > >>>>>>>>>> > >>>>>>>>>> Second, I have no idea what this will do. Are the equal signs > typos? > >>>>>>>> Used by custom code? > >>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > < > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > > >>>>>>>>>> > >>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely > that > >>>>>>>> all the params with an equal-sign are totally ignored unless it’s > just a > >>>>>>>> typo. > >>>>>>>>>> > >>>>>>>>>> Third, the easiest way to see what’s happening under the covers > is to > >>>>>>>> add “&debug=true” to the query and look at the parsed query. > Ignore all the > >>>>>>>> relevance calculations for the nonce, or specify “&debug=query” > to skip > >>>>>>>> that part. > >>>>>>>>>> > >>>>>>>>>> 90% + of the time, the question “why didn’t this query do what I > >>>>>>>> expect” is answered by looking at the “&debug=query” output and > the > >>>>>>>> analysis page in the admin UI. NOTE: for the analysis page be > sure to look > >>>>>>>> at _both_ the query and index output. Also, and very important > about the > >>>>>>>> analysis page (and this is confusing) is that this _assumes_ that > what you > >>>>>>>> put in the text boxes have made it through the query parser > intact and is > >>>>>>>> analyzed by the field selected. Consider the search > "q=field:word1 word2". > >>>>>>>> Now you type “word1 word2” into the analysis text box and it > looks like > >>>>>>>> what you expect. That’s misleading because the query is _parsed_ > as > >>>>>>>> "field:word1 default_search_field:word2”. This is where > “&debug=query” > >>>>>>>> helps. > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Erick > >>>>>>>>>> > >>>>>>>>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana < > paras.leh...@indiamart.com <mailto:paras.leh...@indiamart.com>> > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi Walter, > >>>>>>>>>>> > >>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. > Those words > >>>>>>>> will > >>>>>>>>>>>> not be in the index, so they can never match a query. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> I think the OP's concern is different results when adding a > stopword. I > >>>>>>>>>>> think he's using the filter factory correctly - the query chain > >>>>>>>> includes > >>>>>>>>>>> the filter as well so it should remove "a" while querying. > >>>>>>>>>>> > >>>>>>>>>>> *@Guilherme*, please post results for both the query, the > document in > >>>>>>>>>>> result you are concerned about and post full result of > analysis screen > >>>>>>>> (for > >>>>>>>>>>> both query and index). > >>>>>>>>>>> > >>>>>>>>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood < > wun...@wunderwood.org <mailto:wun...@wunderwood.org>> > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> No. > >>>>>>>>>>>> > >>>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. > Those words > >>>>>>>>>>>> will not be in the index, so they can never match a query. > >>>>>>>>>>>> > >>>>>>>>>>>> 1. Remove the lines with solr.StopFilter from every analysis > chain in > >>>>>>>>>>>> schema.xml. > >>>>>>>>>>>> 2. Reload the collection, restart Solr, or whatever to read > the new > >>>>>>>> config. > >>>>>>>>>>>> 3. Reindex all of the documents. > >>>>>>>>>>>> > >>>>>>>>>>>> When indexed with the new analysis chain, the stopwords will > not be > >>>>>>>>>>>> removed and they will be searchable. > >>>>>>>>>>>> > >>>>>>>>>>>> wunder > >>>>>>>>>>>> Walter Underwood > >>>>>>>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> > >>>>>>>>>>>> http://observer.wunderwood.org/ < > http://observer.wunderwood.org/> (my blog) > >>>>>>>>>>>> > >>>>>>>>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri < > gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>> > >>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Ok. I am kind a lost now. > >>>>>>>>>>>>> If I open up the console > analysis and perform it, that's > the final > >>>>>>>>>>>> result. > >>>>>>>>>>>>> <Screenshot 2019-11-05 at 14.54.16.png> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Your suggestion is: get rid of the <filter stopword.txt> in > the > >>>>>>>>>>>> schema.xml and during index phase replaceAll("in > stopwords.txt"," ") > >>>>>>>> then > >>>>>>>>>>>> add to solr. Is that correct ? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks David > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On 5 Nov 2019, at 14:48, David Hastings < > >>>>>>>> hastings.recurs...@gmail.com <mailto:hastings.recurs...@gmail.com > > > >>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com <mailto: > hastings.recurs...@gmail.com>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Fwd to another server > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> no, > >>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > ignoreCase="true" > >>>>>>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> is still using stopwords and should be removed, in my > opinion of > >>>>>>>> course, > >>>>>>>>>>>>>> based on your use case may be different, but i generally > axe any > >>>>>>>>>>>> reference > >>>>>>>>>>>>>> to them at all > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri < > gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> > >>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks. > >>>>>>>>>>>>>>> Haven't I done this here ? > >>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" > >>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > > >>>>>>>>>>>>>>> <analyzer type="index"> > >>>>>>>>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> > >>>>>>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> > >>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > >>>>>>>>>>>> max="20"/> > >>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > ignoreCase="true" > >>>>>>>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>>>>>>> </analyzer> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:15, David Hastings < > >>>>>>>> hastings.recurs...@gmail.com <mailto:hastings.recurs...@gmail.com > > > >>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com <mailto: > hastings.recurs...@gmail.com>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Fwd to another server > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The first thing you should do is remove any reference to > stop > >>>>>>>> words > >>>>>>>>>>>> and > >>>>>>>>>>>>>>>> never use them, then re-index your data and try it again. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri < > >>>>>>>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> > >>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I am performing a search to match a name (text_field), > however > >>>>>>>> this > >>>>>>>>>>>> term > >>>>>>>>>>>>>>>>> contains 'and' and 'a' and it doesn't return any > records. If i > >>>>>>>> remove > >>>>>>>>>>>>>>> 'a' > >>>>>>>>>>>>>>>>> then it works. > >>>>>>>>>>>>>>>>> e.g > >>>>>>>>>>>>>>>>> Search Term: lymphoid and a non-lymphoid cell > >>>>>>>>>>>>>>>>> doesn't work: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > < > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > > >>>>>>>>>>>> < > >>>>>>>>>>>> > >>>>>>>> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > < > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > > >>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> < > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > < > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Search term: lymphoid and non-lymphoid cell > >>>>>>>>>>>>>>>>> works: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>> > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > < > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > > >>>>>>>>>>>>>>>>> < > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>> > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > < > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> interested in the first result > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> schema.xml > >>>>>>>>>>>>>>>>> <field name="name" > type="text_field" > >>>>>>>>>>>>>>>>> indexed="true" stored="true" omitNorms="false" > >>>>>>>> required="true" > >>>>>>>>>>>>>>>>> multiValued="false"/> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> <analyzer type="query"> > >>>>>>>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" > >>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> > >>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> > >>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> > >>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/> > >>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > >>>>>>>>>>>>>>> max="20"/> > >>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > >>>>>>>>>>>> ignoreCase="true" > >>>>>>>>>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>>>>>>>>> </analyzer> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" > >>>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > > >>>>>>>>>>>>>>>>> <analyzer type="index"> > >>>>>>>>>>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> > >>>>>>>>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> > >>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > >>>>>>>>>>>>>>> max="20"/> > >>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > >>>>>>>>>>>> ignoreCase="true" > >>>>>>>>>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>>>>>>>>> </analyzer> > >>>>>>>>>>>>>>>>> <analyzer type="query"> > >>>>>>>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" > >>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> > >>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> > >>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> > >>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/> > >>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > >>>>>>>>>>>>>>> max="20"/> > >>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > >>>>>>>>>>>> ignoreCase="true" > >>>>>>>>>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>>>>>>>>> </analyzer> > >>>>>>>>>>>>>>>>> </fieldType> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> stopwords.txt > >>>>>>>>>>>>>>>>> #Standard english stop words taken from Lucene's > StopAnalyzer > >>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>> b > >>>>>>>>>>>>>>>>> c > >>>>>>>>>>>>>>>>> .... > >>>>>>>>>>>>>>>>> an > >>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> are > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Running SolR 6.6.2. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Is there anything I could do to prevent this ? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks > >>>>>>>>>>>>>>>>> Guilherme > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> -- > >>>>>>>>>>> Regards, > >>>>>>>>>>> > >>>>>>>>>>> *Paras Lehana* [65871] > >>>>>>>>>>> Development Engineer, Auto-Suggest, > >>>>>>>>>>> IndiaMART Intermesh Ltd. > >>>>>>>>>>> > >>>>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > >>>>>>>>>>> Noida, UP, IN - 201303 > >>>>>>>>>>> > >>>>>>>>>>> Mob.: +91-9560911996 > >>>>>>>>>>> Work: 01203916600 | Extn: *8173* > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> IMPORTANT: > >>>>>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> -- > >>>>>>> Regards, > >>>>>>> > >>>>>>> *Paras Lehana* [65871] > >>>>>>> Development Engineer, Auto-Suggest, > >>>>>>> IndiaMART Intermesh Ltd. > >>>>>>> > >>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > >>>>>>> Noida, UP, IN - 201303 > >>>>>>> > >>>>>>> Mob.: +91-9560911996 > >>>>>>> Work: 01203916600 | Extn: *8173* > >>>>>>> > >>>>>>> -- > >>>>>>> IMPORTANT: > >>>>>>> NEVER share your IndiaMART OTP/ Password with anyone. > >>>>>> > >>>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> -- > >>> Regards, > >>> > >>> Paras Lehana [65871] > >>> Development Engineer, Auto-Suggest, > >>> IndiaMART Intermesh Ltd. > >>> > >>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > >>> Noida, UP, IN - 201303 > >>> > >>> Mob.: +91-9560911996 <tel:+91-9560911996> > >>> Work: 01203916600 | Extn: 8173 > >>> > >>> IMPORTANT: > >>> NEVER share your IndiaMART OTP/ Password with anyone. > > > >