@Hossman -- thanks again. I've made the following change and so far things look good. I couldn't see debug or find results for what I put in for $func, so I just removed it, but making modifications as you suggested appears to be working.
Including the actual line from my endpoint XML in case this thread helps someone else... <str name="q">{!boost defType=synonym_edismax qf='title' synonyms='true' synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq='' v=$q}</str> On Fri, Aug 12, 2016 at 12:09 PM, John Bickerstaff <j...@johnbickerstaff.com > wrote: > Thanks! I'll check it out. > > On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar <susheel2...@gmail.com> > wrote: > >> Not exactly sure what you are looking from chaining the results but >> similar >> functionality is available in Streaming expressions where result of inner >> expressions are passed to outer expressions and so on >> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions >> >> HTH >> Susheel >> >> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff < >> j...@johnbickerstaff.com> >> wrote: >> >> > Hossman - many thanks again for your comprehensive and very helpful >> answer! >> > >> > All, >> > >> > I am (possibly mis-remembering) reading something about being able to >> pass >> > the results of one query to another query... Essentially "chaining" >> result >> > sets. >> > >> > I have looked in docs and can't find anything on a quick search -- I may >> > have been reading about the Re-Ranking feature, which doesn't help me (I >> > know because I just tried and it seems to return all results anyway, >> just >> > re-ranking the number specified in the reRankDocs flag...) >> > >> > Is there a way to (cleanly) send the results of one query to another >> query >> > for further processing? Essentially, pass ONLY the results (including >> an >> > empty set of results) to another query for processing? >> > >> > thanks... >> > >> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff < >> > j...@johnbickerstaff.com> >> > wrote: >> > >> > > Thanks! >> > > >> > > To answer your questions, while I digest the rest of that >> information... >> > > >> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here: >> > > https://github.com/healthonnet/hon-lucene-synonyms >> > > >> > > The config looks like this - and IIRC, is simply a copy from the >> > > recommended cofig on the site mentioned above. >> > > >> > > <queryParser name="synonym_edismax" class="com.github.healthonnet. >> > search. >> > > SynonymExpandingExtendedDismaxQParserPlugin"> >> > > <!-- You can define more than one synonym analyzer in the >> following >> > > list. >> > > For example, you might have one set of synonyms for English, >> one >> > > for French, >> > > one for Spanish, etc. >> > > --> >> > > <lst name="synonymAnalyzers"> >> > > <!-- Name your analyzer something useful, e.g. "analyzer_en", >> > > "analyzer_fr", "analyzer_es", etc. >> > > If you only have one, the name doesn't matter (hence >> > > "myCoolAnalyzer"). >> > > --> >> > > <lst name="myCoolAnalyzer"> >> > > <!-- We recommend a PatternTokenizerFactory that tokenizes >> based >> > > on whitespace and quotes. >> > > This seems to work best with most people's synonym files. >> > > For details, read the discussion here: >> > > http://github.com/healthonnet/hon-lucene-synonyms/issues/26 >> > > --> >> > > <lst name="tokenizer"> >> > > <str name="class">solr.PatternTokenizerFactory</str> >> > > <str name="pattern"><![CDATA[(?:\s|\")+]]></str> >> > > </lst> >> > > <!-- The ShingleFilterFactory outputs synonyms of multiple >> token >> > > lengths (e.g. unigrams, bigrams, trigrams, etc.). >> > > The default here is to assume you don't have any synonyms >> > > longer than 4 tokens. >> > > You can tweak this depending on what your synonyms look >> > like. >> > > E.g. if you only have unigrams, you can remove >> > > it entirely, and if your synonyms are up to 7 tokens in >> > > length, you should set the maxShingleSize to 7. >> > > --> >> > > <lst name="filter"> >> > > <str name="class">solr.ShingleFilterFactory</str> >> > > <str name="outputUnigramsIfNoShingles">true</str> >> > > <str name="outputUnigrams">true</str> >> > > <str name="minShingleSize">2</str> >> > > <str name="maxShingleSize">4</str> >> > > </lst> >> > > <!-- This is where you set your synonym file. For the unit >> tests >> > > and "Getting Started" examples, we use example_synonym_file.txt. >> > > This plugin will work best if you keep expand set to true >> > and >> > > have all your synonyms comma-separated (rather than =>-separated). >> > > --> >> > > <lst name="filter"> >> > > <str name="class">solr.SynonymFilterFactory</str> >> > > <str name="tokenizerFactory">solr. >> > KeywordTokenizerFactory</str> >> > > <str name="synonyms">example_synonym_file.txt</str> >> > > <str name="expand">true</str> >> > > <str name="ignoreCase">true</str> >> > > </lst> >> > > </lst> >> > > </lst> >> > > </queryParser> >> > > >> > > >> > > >> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter < >> > hossman_luc...@fucit.org >> > > > wrote: >> > > >> > >> >> > >> : First let me say that this is very possibly the "x - y problem" so >> let >> > >> me >> > >> : state up front what my ultimate need is -- then I'll ask about the >> > >> thing I >> > >> : imagine might help... which, of course, is heavily biased in the >> > >> direction >> > >> : of my experience coding Java and writing SQL... >> > >> >> > >> Thank you so much for asking your question this way! >> > >> >> > >> Right off the bat, the background you've provided seems supicious... >> > >> >> > >> : I have a piece of a query that calculates a score based on a >> > "weighting" >> > >> ... >> > >> : The specific line is this: >> > >> : <str name="bf">product(field(category_weight),20)</str> >> > >> : >> > >> : What I just realized is that when I query Solr for a string that >> has >> > NO >> > >> : matches in the entire corpus, I still get a slew of results because >> > >> EVERY >> > >> : doc has the weighting value in the category_weight field - and >> > therefore >> > >> : every doc gets some score. >> > >> >> > >> ...that is *NOT* how dismax and edisamx normally work. >> > >> >> > >> While both the "bf" abd "bq" params result in "additive" boosting, >> and >> > the >> > >> implementation of that "additive boost" comes from adding new >> optional >> > >> clauses to the top level BooleanQuery that is executed, that only >> > happens >> > >> after the "main" query (from your "q" param) is added to that top >> level >> > >> BooleanQuery as a "mandaory" clause. >> > >> >> > >> So, for example, "bf=true()" and "bq=*:*" should match & boost every >> > doc, >> > >> but with the techprducts configs/data these requests still don't >> match >> > >> anything... >> > >> >> > >> /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query >> > >> /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query >> > >> >> > >> ...and if you look at the debug output, the parsed queries shows that >> > the >> > >> "bogus" part of the query is mandatory... >> > >> >> > >> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*) >> > >> FunctionQuery(const(true)) >> > >> >> > >> (i didn't use "pf" in that example, but the effect is the same, the >> "pf" >> > >> based clauses are optional, while the "qf" based clauses are >> mandatory) >> > >> >> > >> If you compare that example to your debug output, you'll notice a >> > >> difference in structure -- it's a bit hard to see in your example, >> but >> > if >> > >> you simplify your qf, pf, and q fields it should be more obvious, but >> > >> AFAICT the "main" parts of your query are getting wrapped in an extra >> > >> layer of parents (ie: an extra BooleanQuery) which is *not* >> mandatory in >> > >> the top level query ... i don't see *any* mandatory clauses in your >> top >> > >> level BooleanQuery, which is why any match on a bf or bq function is >> > >> enough to cause a document to match. >> > >> >> > >> I suspect the reason your parsed query structure is so diff has to do >> > with >> > >> this... >> > >> >> > >> : <str name="defType">synonym_edismax</str>> >> > >> >> > >> >> > >> 1) how exactly is "synonym_edismax" defined in your solrconfig.xml? >> > >> 2) what QParserPlugin are you using to implement that? >> > >> >> > >> I suspect whatever QParserPlugin you are using has a bug in it :) >> > >> >> > >> >> > >> If you can't fix the bug, one possibile workaround would be to >> abandon >> > bf >> > >> and bq params completely, and instead wrap the query it produces in >> in a >> > >> {!boost} parser with whatever function you want (using functions like >> > >> sum() or prod() to combine multiple functions, and query() to >> > incorporate >> > >> your current bq param). Doing this will require chanign how you >> specify >> > >> you input (example below) and it will result in *multiplicitive* >> boosts >> > -- >> > >> so your scores will be much diff, and you will likely have to adjust >> > your >> > >> constants, but: 1) multiplicitive boosts are almost always what >> people >> > >> *really* want anyway; 2) it will ensure the boosts are only applied >> for >> > >> things matching your main query, no matter how that query parser >> works >> > or >> > >> what bugs it has. >> > >> >> > >> Example of using {!boost} to wrap an arbitrary other parser... >> > >> >> > >> instead of... >> > >> defType=foofoo >> > >> q=barbarbar >> > >> >> > >> use... >> > >> q={!boost b=$func defType=foofoo v=$qq} >> > >> qq=barbarbar >> > >> func=sum(something,somethingelse) >> > >> >> > >> https://cwiki.apache.org/confluence/display/solr/Other+Parsers >> > >> https://cwiki.apache.org/confluence/display/solr/Function+Queries >> > >> >> > >> >> > >> >> > >> >> > >> : >> > >> : What I would like is to return zero results if there is no match >> for >> > the >> > >> : querystring. My collection is small enough that I don't care if >> the >> > >> actual >> > >> : calculation runs on each doc (although that's wasteful) -- I just >> > don't >> > >> : want to see results come back for zero matches to the querystring >> > >> : >> > >> : (The /select endpoint does this of course, but my custom endpoint >> > >> includes >> > >> : this "weighting" piece and therefore returns every doc in the >> corpus >> > >> : because they all have the weighting. >> > >> : >> > >> : ==================== >> > >> : Enter my imagined solution... The potential X-Y problem... >> > >> : ==================== >> > >> : >> > >> : So - given that I come from a programming background, I immediately >> > >> start >> > >> : thinking of an if statement ... >> > >> : >> > >> : if(some_score_for_the_primary_search_string) { >> > >> : run_the_category_weight_calculation; >> > >> : } else { >> > >> : do_NOT_run_category_weight_calc; >> > >> : } >> > >> : >> > >> : >> > >> : Another way of thinking of it would be something like the "WHERE" >> > >> clause in >> > >> : SQL... >> > >> : >> > >> : run_category_weight_calculation WHERE "searchstring" is found in >> the >> > >> : document, not otherwise. >> > >> : >> > >> : I'm aware that things could be handled in the client-side of my web >> > app, >> > >> : but if possible, I'd like the interface to SOLR to be as clean as >> > >> possible, >> > >> : and massage incoming SOLR data as little as possible. >> > >> : >> > >> : In other words, do NOT return any docs if the querystring (and any >> > >> : synonyms) match zero docs. >> > >> : >> > >> : Here is the endpoint XML for the query. I've highlighted the >> specific >> > >> line >> > >> : that is causing the unintended results... >> > >> : >> > >> : >> > >> : <requestHandler name="/foo" class="solr.SearchHandler"> >> > >> : <!-- default values for query parameters can be specified, >> these >> > >> : will be overridden by parameters in the request >> > >> : --> >> > >> : <lst name="defaults"> >> > >> : <str name="echoParams">all</str> >> > >> : <int name="rows">20</int> >> > >> : <!-- Query settings --> >> > >> : <str name="df">text</str> >> > >> : <!-- <str name="df">title</str> --> >> > >> : <str name="defType">synonym_edismax</str>> >> > >> : <str name="synonyms">true</str> >> > >> : <!-- The line below balances out the weighting of exact >> matches to >> > >> the >> > >> : synonym phrase entered by the user >> > >> : with the category_weight calculation and the titleQuery >> calc. >> > >> : These numbers exist in a balance and >> > >> : if one is raised or lowered, the others (probably) need to >> > >> change >> > >> : as well. It may be better to go with decimals >> > >> : for all of them... .4 instead of 4 and 2 instead of 20 and >> > 2.5 >> > >> : instead of 25. >> > >> : In the end, I'm not sure it really matters, but don't >> change >> > >> one >> > >> : without changing the others >> > >> : unless you've tested and are sure you want the results >> --> >> > >> : <float name="synonyms.originalBoost">1.5</float> >> > >> : <float name="synonyms.synonymBoost">1.1</float> >> > >> : <str name="mm">75%</str> >> > >> : <str name="q.alt">*:*</str> >> > >> : <str name="rows">20</str> >> > >> : <str name="fq">meta_doc_type:chapterDoc</str> >> > >> : <str name="bq">{!synonym_edismax qf='title' synonyms='true' >> > >> : synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' >> bq='' >> > >> : v=$q}</str> >> > >> : <str name="fl">id category_weight title category_ss score >> > >> : contentType</str> >> > >> : <str name="titleQuery">{!edismax qf='title' bf='' bq='' >> > >> v=$q}</str> >> > >> : ===================================================== >> > >> : *<str name="bf">product(field(category_weight),20)</str>* >> > >> : ===================================================== >> > >> : <str name="bf">product(query($titleQuery),4)</str> >> > >> : <str name="qf">text contentType^1000</str> >> > >> : <str name="wt">python</str> >> > >> : <str name="debug">true</str> >> > >> : <str name="debug.explain.structured">true</str> >> > >> : <str name="indent">true</str> >> > >> : <str name="echoParams">all</str> >> > >> : </lst> >> > >> : </requestHandler> >> > >> : >> > >> : And here is the debug output for a query. (This was a test for >> > >> synonyms, >> > >> : which you'll see in the output.) The original query string was, of >> > >> : course, "μ-heavy >> > >> : chain disease" >> > >> : >> > >> : You'll note that although there is no score in the first doc >> explain >> > for >> > >> : the actual querystring, the highlighted section does get a score >> for >> > >> : product(double(category_weight)=1.5,const(20)) >> > >> : >> > >> : ... which is the thing that is currently causing all the docs in >> the >> > >> : collection to "match" even though the querystring is not in any of >> > them. >> > >> : >> > >> : "debug":{ "rawquerystring":"\"μ-heavy chain disease\"", >> > >> : "querystring":"\"μ-heavy >> > >> : chain disease\"", "parsedquery":"(DisjunctionMaxQuery((text:\"μ >> heavy >> > >> chain >> > >> : disease\" | (contentType:\"μ heavy chain disease\")^1000.0))^1.5 >> > >> : ((+DisjunctionMaxQuery((text:\"mu heavy chain disease\" | >> > >> (contentType:\"mu >> > >> : heavy chain disease\")^1000.0)))/no_coord^1.1) >> > >> : ((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ >> > >> : hcd\")^1000.0)))/no_coord^1.1) ((+DisjunctionMaxQuery((text:\"μ >> heavy >> > >> chain >> > >> : disease\" | (contentType:\"μ heavy chain >> > disease\")^1000.0)))/no_coord^ >> > >> 1.1) >> > >> : ((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ >> > >> : hcd\")^1000.0)))/no_coord^1.1)) ((DisjunctionMaxQuery((title:\"μ >> > heavy >> > >> : chain disease\"))^2.5 ((+DisjunctionMaxQuery((title:\"mu heavy >> chain >> > >> : disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ >> > >> : hcd\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ heavy >> chain >> > >> : disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ >> > >> : hcd\")))/no_coord^1.1))) >> > >> : FunctionQuery(product(double(category_weight),const(20))) >> > >> : FunctionQuery(product(query(+(title:\"μ heavy chain >> > >> : disease\"),def=0.0),const(4)))", "parsedquery_toString":"(((tex >> t:\"μ >> > >> heavy >> > >> : chain disease\" | (contentType:\"μ heavy chain >> disease\")^1000.0))^1.5 >> > >> : ((+(text:\"mu heavy chain disease\" | (contentType:\"mu heavy chain >> > >> : disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | (contentType:\"μ >> > >> : hcd\")^1000.0))^1.1) ((+(text:\"μ heavy chain disease\" | >> > >> (contentType:\"μ >> > >> : heavy chain disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | >> > >> (contentType:\"μ >> > >> : hcd\")^1000.0))^1.1)) ((((title:\"μ heavy chain disease\"))^2.5 >> > >> : ((+(title:\"mu heavy chain disease\"))^1.1) ((+(title:\"μ >> hcd\"))^1.1) >> > >> : ((+(title:\"μ heavy chain disease\"))^1.1) ((+(title:\"μ >> > hcd\"))^1.1))) >> > >> : product(double(category_weight),const(20)) >> product(query(+(title:\"μ >> > >> heavy >> > >> : chain disease\"),def=0.0),const(4))", "explain":{ " >> > >> : 33d808fe-6ccf-4305-a643-48e94de34d18":{ "match":true, >> "value":30.0, " >> > >> : description":"sum of:", "details":[{ "match":true, "value":30.0, " >> > >> : description":"FunctionQuery(product(double(category_weight), >> > >> const(20))), >> > >> : product of:", >> > >> : ===================================================== >> > >> : *"details":**[{ "match":true, "value":30.0, >> > >> : "description":"product(double(category_weight)=1.5,const(20))"}, >> {* >> > >> : ===================================================== >> > >> : >> > >> : "match":true, "value":1.0, "description":"boost"}, { "match":true, >> > >> "value": >> > >> : 1.0, "description":"queryNorm"}]}, { >> > >> : >> > >> >> > >> -Hoss >> > >> http://www.lucidworks.com/ >> > > >> > > >> > > >> > >> > >