I use 3 word shingles with stopwords for my MLT ML trainer that worked pretty well for such a solution, but for a full index the size became prohibitive
On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood <wun...@wunderwood.org> wrote: > If we had IDF for phrases, they would be super effective. The 2X weight is > a hack that mostly works. > > Infoseek had phrase IDF and it was a killer algorithm for relevance. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Nov 8, 2019, at 11:08 AM, David Hastings < > hastings.recurs...@gmail.com> wrote: > > > > the pf and qf fields are REALLY nice for this > > > > On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood <wun...@wunderwood.org> > > wrote: > > > >> I always enable phrase searching in edismax for exactly this reason. > >> > >> Something like: > >> > >> <str name="qf”>title^8 keywords^4 text</str> > >> <str name="pf”>title^16 keywords^8 text^2</str> > >> > >> To deal with concepts in queries, a classifier and/or named entity > >> extractor can be helpful. If you have a list of concepts (“controlled > >> vocabulary”) that includes “Lamin A”, and that shows up in a query, that > >> term can be queried against the field matching that vocabulary. > >> > >> This is how LinkedIn separates people, companies, and places, for > example. > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org > >> http://observer.wunderwood.org/ (my blog) > >> > >>> On Nov 8, 2019, at 10:48 AM, Erick Erickson <erickerick...@gmail.com> > >> wrote: > >>> > >>> Look at the “mm” parameter, try setting it to 100%. Although that’t not > >> entirely likely to do what you want either since virtually every doc > will > >> have “a” in it. But at least you’d get docs that have both terms. > >>> > >>> you may also be able to search for things like “Lamin A” _only as a > >> phrase_ and have some luck. But this is a gnarly problem in general. > Some > >> people have been able to substitute synonyms and/or shingles to make > this > >> work at the expense of a larger index. > >>> > >>> This is a generic problem with context. “Lamin A” is really a > “concept”, > >> not just two words that happen to be near each other. Searching as a > phrase > >> is an OOB-but-naive way to try to make it more likely that the ranked > >> results refer to the _concept_ of “Lamin A”. The assumption here is “if > >> these two words appear next to each other, they’re more likely to be > what I > >> want”. I say “naive” because “Lamins: A new approach to...” would > _also_ be > >> found for a naive phrase search. (I have no idea whether such a title > makes > >> sense or not, but you figured that out already)... > >>> > >>> To do this well you’d have to dive in to NLP/Machine learning. > >>> > >>> I truly wish we could have the DWIM search algorithm (Do What I Mean)…. > >>> > >>>> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri <gvit...@ebi.ac.uk> > >> wrote: > >>>> > >>>> HI Walter and Paras > >>>> > >>>> I indexed it removing all the references to StopWordFilter and I went > >> from 121 results to near 20K as the search term q="Lymphoid and a > >> non-Lymphoid cell" is matching entities such as "IFT A" or "Lamin A". > So I > >> don't think removing it completely is the way to go from the scenario we > >> have, but I appreciate the suggestion… > >>>> > >>>> Yes the response is using fl=* > >>>> I am trying some combinations at the moment, but yet no success. > >>>> > >>>> defType=edismax > >>>> q.alt=Lymphoid and a non-Lymphoid cell > >>>> Number of results=1599 > >>>> Quite a considerable increase, even though reasonable meaningful > >> results. > >>>> > >>>> I am sorry but I didn't understand what do you want me to do exactly > >> with the lst (??) and qf and bf. > >>>> > >>>> Thanks everyone with their inputs > >>>> > >>>> > >>>>> On 8 Nov 2019, at 06:45, Paras Lehana <paras.leh...@indiamart.com> > >> wrote: > >>>>> > >>>>> Hi Guilherme > >>>>> > >>>>> By accident, I ended up querying the using the default handler > >> (/select) and it worked. > >>>>> > >>>>> You've just found the culprit. Thanks for giving the material I > >> requested. Your analysis chain is working as expected. I don't see any > >> issue in either StopWordFilter or your boosts. I also use a boost of 50 > >> when boosting contextual suggestions (boosting "gold iphone" on a page > of > >> iphone) but I take Walter's suggestion and would try to optimize my > >> weights. I agree that this 50 thing was not researched much about by us > as > >> well (we never faced performance or relevance issues). > >>>>> > >>>>> See the major difference in both the handlers - edismax. I'm pretty > >> sure that your problem lies in the parsing of queries (you can confirm > that > >> from parsedquery key in debug of both JSON responses). I hope you have > >> provided the response with fl=*. Replace q with q.alt in your /search > >> handler query and I think you should start getting responses. That's > >> because q.alt uses standard parser. If you want to keep using edisMax, I > >> suggest you to test the responses removing some combination of lst (qf, > bf) > >> and find what's restricting the documents to come up. I'm out of office > >> today - would have certainly tried analyzing the field values of the > >> document in /select request and compare it with qf/bq in solrconfig.xml > >> /search. Do this for me and you'd certainly find something. > >>>>> > >>>>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood <wun...@wunderwood.org > >> <mailto:wun...@wunderwood.org>> wrote: > >>>>> I normally use a weight of 8 for the most important field, like > title. > >> Other fields might get a 4 or 2. > >>>>> > >>>>> I add a “pf” field with the weights doubled, so that phrase matches > >> have a higher weight. > >>>>> > >>>>> The weight of 8 comes from experience at Infoseek and Inktomi, two > >> early web search engines. With different relevance algorithms and > totally > >> different evaluation and tuning systems, they settled on weights of 8 > and > >> 7.5 for HTML titles. With the the two radically different system getting > >> the same number, I decided that was a property of the documents, not of > the > >> search engines. > >>>>> > >>>>> wunder > >>>>> Walter Underwood > >>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> > >>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> > >> (my blog) > >>>>> > >>>>>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gvit...@ebi.ac.uk > >> <mailto:gvit...@ebi.ac.uk>> wrote: > >>>>>> > >>>>>> Hi Wunder, > >>>>>> > >>>>>> My indexer takes quite a few hours to be executed I am shortening it > >> to run faster, but I also need to make sure it gives what we are > expecting. > >> This implementation's been there for >4y, and massively used. > >>>>>> > >>>>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely > >> high. I don’t think I’ve ever used a weight higher than 16 in a dozen > years > >> of configuring Solr. > >>>>>> I've inherited that implementation and I am really keen to adequate > >> it, what would you recommend ? > >>>>>> > >>>>>> Cheers > >>>>>> Guilherme > >>>>>> > >>>>>>> On 7 Nov 2019, at 14:43, Walter Underwood <wun...@wunderwood.org > >> <mailto:wun...@wunderwood.org>> wrote: > >>>>>>> > >>>>>>> Thanks for posting the files. Looking at schema.xml, I see that you > >> still are using StopFilterFactory. The first advice we gave you was to > >> remove that. > >>>>>>> > >>>>>>> Remove StopFilterFactory everywhere and reindex. > >>>>>>> > >>>>>>> You will continue to have problems matching stopwords until you do > >> that. > >>>>>>> > >>>>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely > >> high. I don’t think I’ve ever used a weight higher than 16 in a dozen > years > >> of configuring Solr. > >>>>>>> > >>>>>>> wunder > >>>>>>> Walter Underwood > >>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> > >>>>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> > >> (my blog) > >>>>>>> > >>>>>>>> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk > >> <mailto:gvit...@ebi.ac.uk>> wrote: > >>>>>>>> > >>>>>>>> Hi Paras, everyone > >>>>>>>> > >>>>>>>> Thank you again for your inputs and suggestions. I sorry to hear > >> you had trouble with the attachments I will host it somewhere and share > the > >> links. > >>>>>>>> I don't tweak my index, I get the data from the graph database, > >> create a document as they are and save to solr. > >>>>>>>> > >>>>>>>> So, I am sending the new analysis screen querying the way you > >> suggested. Also the results with params and solr query url. > >>>>>>>> > >>>>>>>> During the process of querying what you asked I found something > >> really weird (at least for me). By accident, I ended up querying the > using > >> the default handler (/select) and it worked. Then If I use the one I > must > >> use, then sadly doesn't work. I am posting both results and I will also > >> post the handlers as well. > >>>>>>>> > >>>>>>>> Here is the link with all the files mentioned before > >>>>>>>> > >> > https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0< > >> > https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0> > >> < > https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 > >> < > https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 > >>>> > >>>>>>>> If the link doesn't work www dot dropbox dot com slash sh slash > >> fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0 > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> > >>>>>>>>> On 7 Nov 2019, at 05:23, Paras Lehana < > paras.leh...@indiamart.com > >> <mailto:paras.leh...@indiamart.com>> wrote: > >>>>>>>>> > >>>>>>>>> Hi Guilherme. > >>>>>>>>> > >>>>>>>>> I am sending they analysis result and the json result as > requested. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Thanks for the effort. Luckily, I can see your attachments (low > >> quality > >>>>>>>>> though). > >>>>>>>>> > >>>>>>>>> From the analysis screen, the analysis is working as expected. > One > >> of the > >>>>>>>>> reasons for query="lymphoid and *a* non-lymphoid cell" not > matching > >>>>>>>>> document containing "Lymphoid and a non-Lymphoid cell" I can > >> initially > >>>>>>>>> think of is: the stopword "a" is probably present in > post-analysis > >> either > >>>>>>>>> of query or index. Did you tweak your index time analysis after > >> indexing? > >>>>>>>>> > >>>>>>>>> Do two things: > >>>>>>>>> > >>>>>>>>> 1. Post the analysis screen for and index=*"Immunoregulatory > >>>>>>>>> interactions between a Lymphoid and a non-Lymphoid cell"* and > >>>>>>>>> "query=*"lymphoid > >>>>>>>>> and a non-lymphoid cell"*. Try hosting the image and providing > the > >> link > >>>>>>>>> here. > >>>>>>>>> 2. Give the same JSON output as you have sent but this time with > >>>>>>>>> *"echoParams=all"*. Also, post the exact Solr query url. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Wed, 6 Nov 2019 at 21:07, Erick Erickson < > >> erickerick...@gmail.com <mailto:erickerick...@gmail.com>> wrote: > >>>>>>>>> > >>>>>>>>>> I don’t see the attachments, maybe I deleted old e-mails or some > >> such. The > >>>>>>>>>> Apache server is fairly aggressive about stripping attachments > >> though, so > >>>>>>>>>> it’s also possible they didn’t make it through. > >>>>>>>>>> > >>>>>>>>>>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri < > gvit...@ebi.ac.uk > >> <mailto:gvit...@ebi.ac.uk>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Thanks Erick. > >>>>>>>>>>> > >>>>>>>>>>>> First, your index and analysis chains are considerably > >> different, this > >>>>>>>>>> can easily be a source of problems. In particular, using two > >> different > >>>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against > >> this unless > >>>>>>>>>> you’re totally sure you understand the consequences. > >> Additionally, your use > >>>>>>>>>> of the length filter is suspicious, especially since your > problem > >> statement > >>>>>>>>>> is about the addition of a single letter term and the min length > >> allowed on > >>>>>>>>>> that filter is 2. That said, it’s reasonable to suppose that the > >> ’a’ is > >>>>>>>>>> filtered out in both cases, but maybe you’ve found something odd > >> about the > >>>>>>>>>> interactions. > >>>>>>>>>>> I will investigate the min length and post the results later. > >>>>>>>>>>> > >>>>>>>>>>>> Second, I have no idea what this will do. Are the equal signs > >> typos? > >>>>>>>>>> Used by custom code? > >>>>>>>>>>> This the url in my application, not solr params. That's the > >> query string. > >>>>>>>>>>> > >>>>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s > likely > >> that > >>>>>>>>>> all the params with an equal-sign are totally ignored unless > it’s > >> just a > >>>>>>>>>> typo. > >>>>>>>>>>> This is part of the application. Species will be used later on > >> in solr > >>>>>>>>>> to filter out the result. That's not solr. That my app params. > >>>>>>>>>>> > >>>>>>>>>>>> Third, the easiest way to see what’s happening under the > covers > >> is to > >>>>>>>>>> add “&debug=true” to the query and look at the parsed query. > >> Ignore all the > >>>>>>>>>> relevance calculations for the nonce, or specify “&debug=query” > >> to skip > >>>>>>>>>> that part. > >>>>>>>>>>> The two json files i've sent, they are debugQuery=on and the > >> explain tag > >>>>>>>>>> is present. > >>>>>>>>>>> I will try the searching the way you mentioned. > >>>>>>>>>>> > >>>>>>>>>>> Thank for your inputs > >>>>>>>>>>> > >>>>>>>>>>> Guilherme > >>>>>>>>>>> > >>>>>>>>>>>> On 6 Nov 2019, at 14:14, Erick Erickson < > >> erickerick...@gmail.com <mailto:erickerick...@gmail.com>> > >>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Fwd to another server > >>>>>>>>>>>> > >>>>>>>>>>>> First, your index and analysis chains are considerably > >> different, this > >>>>>>>>>> can easily be a source of problems. In particular, using two > >> different > >>>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against > >> this unless > >>>>>>>>>> you’re totally sure you understand the consequences. > >> Additionally, your use > >>>>>>>>>> of the length filter is suspicious, especially since your > problem > >> statement > >>>>>>>>>> is about the addition of a single letter term and the min length > >> allowed on > >>>>>>>>>> that filter is 2. That said, it’s reasonable to suppose that the > >> ’a’ is > >>>>>>>>>> filtered out in both cases, but maybe you’ve found something odd > >> about the > >>>>>>>>>> interactions. > >>>>>>>>>>>> > >>>>>>>>>>>> Second, I have no idea what this will do. Are the equal signs > >> typos? > >>>>>>>>>> Used by custom code? > >>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >> < > >> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>> > >>>>>>>>>>>> > >>>>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s > likely > >> that > >>>>>>>>>> all the params with an equal-sign are totally ignored unless > it’s > >> just a > >>>>>>>>>> typo. > >>>>>>>>>>>> > >>>>>>>>>>>> Third, the easiest way to see what’s happening under the > covers > >> is to > >>>>>>>>>> add “&debug=true” to the query and look at the parsed query. > >> Ignore all the > >>>>>>>>>> relevance calculations for the nonce, or specify “&debug=query” > >> to skip > >>>>>>>>>> that part. > >>>>>>>>>>>> > >>>>>>>>>>>> 90% + of the time, the question “why didn’t this query do > what I > >>>>>>>>>> expect” is answered by looking at the “&debug=query” output and > >> the > >>>>>>>>>> analysis page in the admin UI. NOTE: for the analysis page be > >> sure to look > >>>>>>>>>> at _both_ the query and index output. Also, and very important > >> about the > >>>>>>>>>> analysis page (and this is confusing) is that this _assumes_ > that > >> what you > >>>>>>>>>> put in the text boxes have made it through the query parser > >> intact and is > >>>>>>>>>> analyzed by the field selected. Consider the search > >> "q=field:word1 word2". > >>>>>>>>>> Now you type “word1 word2” into the analysis text box and it > >> looks like > >>>>>>>>>> what you expect. That’s misleading because the query is _parsed_ > >> as > >>>>>>>>>> "field:word1 default_search_field:word2”. This is where > >> “&debug=query” > >>>>>>>>>> helps. > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Erick > >>>>>>>>>>>> > >>>>>>>>>>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana < > >> paras.leh...@indiamart.com <mailto:paras.leh...@indiamart.com>> > >>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi Walter, > >>>>>>>>>>>>> > >>>>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. > >> Those words > >>>>>>>>>> will > >>>>>>>>>>>>>> not be in the index, so they can never match a query. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> I think the OP's concern is different results when adding a > >> stopword. I > >>>>>>>>>>>>> think he's using the filter factory correctly - the query > chain > >>>>>>>>>> includes > >>>>>>>>>>>>> the filter as well so it should remove "a" while querying. > >>>>>>>>>>>>> > >>>>>>>>>>>>> *@Guilherme*, please post results for both the query, the > >> document in > >>>>>>>>>>>>> result you are concerned about and post full result of > >> analysis screen > >>>>>>>>>> (for > >>>>>>>>>>>>> both query and index). > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood < > >> wun...@wunderwood.org <mailto:wun...@wunderwood.org>> > >>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> No. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. > >> Those words > >>>>>>>>>>>>>> will not be in the index, so they can never match a query. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 1. Remove the lines with solr.StopFilter from every analysis > >> chain in > >>>>>>>>>>>>>> schema.xml. > >>>>>>>>>>>>>> 2. Reload the collection, restart Solr, or whatever to read > >> the new > >>>>>>>>>> config. > >>>>>>>>>>>>>> 3. Reindex all of the documents. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> When indexed with the new analysis chain, the stopwords will > >> not be > >>>>>>>>>>>>>> removed and they will be searchable. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> wunder > >>>>>>>>>>>>>> Walter Underwood > >>>>>>>>>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> > >>>>>>>>>>>>>> http://observer.wunderwood.org/ < > >> http://observer.wunderwood.org/> (my blog) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri < > >> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>> > >>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Ok. I am kind a lost now. > >>>>>>>>>>>>>>> If I open up the console > analysis and perform it, that's > >> the final > >>>>>>>>>>>>>> result. > >>>>>>>>>>>>>>> <Screenshot 2019-11-05 at 14.54.16.png> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Your suggestion is: get rid of the <filter stopword.txt> in > >> the > >>>>>>>>>>>>>> schema.xml and during index phase replaceAll("in > >> stopwords.txt"," ") > >>>>>>>>>> then > >>>>>>>>>>>>>> add to solr. Is that correct ? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks David > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:48, David Hastings < > >>>>>>>>>> hastings.recurs...@gmail.com <mailto: > hastings.recurs...@gmail.com > >>> > >>>>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com <mailto: > >> hastings.recurs...@gmail.com>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Fwd to another server > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> no, > >>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > >> ignoreCase="true" > >>>>>>>>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> is still using stopwords and should be removed, in my > >> opinion of > >>>>>>>>>> course, > >>>>>>>>>>>>>>>> based on your use case may be different, but i generally > >> axe any > >>>>>>>>>>>>>> reference > >>>>>>>>>>>>>>>> to them at all > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri < > >> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> > >>>>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> > wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks. > >>>>>>>>>>>>>>>>> Haven't I done this here ? > >>>>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" > >>>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > > >>>>>>>>>>>>>>>>> <analyzer type="index"> > >>>>>>>>>>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> > >>>>>>>>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> > >>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > >>>>>>>>>>>>>> max="20"/> > >>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > >> ignoreCase="true" > >>>>>>>>>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>>>>>>>>> </analyzer> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:15, David Hastings < > >>>>>>>>>> hastings.recurs...@gmail.com <mailto: > hastings.recurs...@gmail.com > >>> > >>>>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com <mailto: > >> hastings.recurs...@gmail.com>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Fwd to another server > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> The first thing you should do is remove any reference to > >> stop > >>>>>>>>>> words > >>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>> never use them, then re-index your data and try it > again. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri < > >>>>>>>>>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> > >>>>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I am performing a search to match a name (text_field), > >> however > >>>>>>>>>> this > >>>>>>>>>>>>>> term > >>>>>>>>>>>>>>>>>>> contains 'and' and 'a' and it doesn't return any > >> records. If i > >>>>>>>>>> remove > >>>>>>>>>>>>>>>>> 'a' > >>>>>>>>>>>>>>>>>>> then it works. > >>>>>>>>>>>>>>>>>>> e.g > >>>>>>>>>>>>>>>>>>> Search Term: lymphoid and a non-lymphoid cell > >>>>>>>>>>>>>>>>>>> doesn't work: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >> < > >> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>> > >>>>>>>>>>>>>> < > >>>>>>>>>>>>>> > >>>>>>>>>> > >> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >> < > >> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> < > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >> < > >> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Search term: lymphoid and non-lymphoid cell > >>>>>>>>>>>>>>>>>>> works: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >> > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >> < > >> > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>> > >>>>>>>>>>>>>>>>>>> < > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >> > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >> < > >> > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> interested in the first result > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> schema.xml > >>>>>>>>>>>>>>>>>>> <field name="name" > >> type="text_field" > >>>>>>>>>>>>>>>>>>> indexed="true" stored="true" omitNorms="false" > >>>>>>>>>> required="true" > >>>>>>>>>>>>>>>>>>> multiValued="false"/> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> <analyzer type="query"> > >>>>>>>>>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" > >>>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > >>>>>>>>>>>>>>>>> max="20"/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > >>>>>>>>>>>>>> ignoreCase="true" > >>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>>>>>>>>>>> </analyzer> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" > >>>>>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > > >>>>>>>>>>>>>>>>>>> <analyzer type="index"> > >>>>>>>>>>>>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > >>>>>>>>>>>>>>>>> max="20"/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > >>>>>>>>>>>>>> ignoreCase="true" > >>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>>>>>>>>>>> </analyzer> > >>>>>>>>>>>>>>>>>>> <analyzer type="query"> > >>>>>>>>>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" > >>>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > >>>>>>>>>>>>>>>>> max="20"/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > >>>>>>>>>>>>>> ignoreCase="true" > >>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>>>>>>>>>>> </analyzer> > >>>>>>>>>>>>>>>>>>> </fieldType> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> stopwords.txt > >>>>>>>>>>>>>>>>>>> #Standard english stop words taken from Lucene's > >> StopAnalyzer > >>>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>> b > >>>>>>>>>>>>>>>>>>> c > >>>>>>>>>>>>>>>>>>> .... > >>>>>>>>>>>>>>>>>>> an > >>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>> are > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Running SolR 6.6.2. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Is there anything I could do to prevent this ? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Thanks > >>>>>>>>>>>>>>>>>>> Guilherme > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> Regards, > >>>>>>>>>>>>> > >>>>>>>>>>>>> *Paras Lehana* [65871] > >>>>>>>>>>>>> Development Engineer, Auto-Suggest, > >>>>>>>>>>>>> IndiaMART Intermesh Ltd. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > >>>>>>>>>>>>> Noida, UP, IN - 201303 > >>>>>>>>>>>>> > >>>>>>>>>>>>> Mob.: +91-9560911996 > >>>>>>>>>>>>> Work: 01203916600 | Extn: *8173* > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> IMPORTANT: > >>>>>>>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone. > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> -- > >>>>>>>>> Regards, > >>>>>>>>> > >>>>>>>>> *Paras Lehana* [65871] > >>>>>>>>> Development Engineer, Auto-Suggest, > >>>>>>>>> IndiaMART Intermesh Ltd. > >>>>>>>>> > >>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > >>>>>>>>> Noida, UP, IN - 201303 > >>>>>>>>> > >>>>>>>>> Mob.: +91-9560911996 > >>>>>>>>> Work: 01203916600 | Extn: *8173* > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> IMPORTANT: > >>>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone. > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> -- > >>>>> Regards, > >>>>> > >>>>> Paras Lehana [65871] > >>>>> Development Engineer, Auto-Suggest, > >>>>> IndiaMART Intermesh Ltd. > >>>>> > >>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > >>>>> Noida, UP, IN - 201303 > >>>>> > >>>>> Mob.: +91-9560911996 <tel:+91-9560911996> > >>>>> Work: 01203916600 | Extn: 8173 > >>>>> > >>>>> IMPORTANT: > >>>>> NEVER share your IndiaMART OTP/ Password with anyone. > >>> > >> > >> > >