Thanks Doug. I might have to take you on the hangout offer. Let me refine the requirement further and if I still see the need, I will let you know.
Steve On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > How you have tie is fine. Setting tie to 1 might give you reasonable > results. You could easily still have scores that are just always an order > of magnitude or two higher, but try it out! > > BTW Anything you put in teh URL can also be put into a request handler. > > If you ever just want to have a 15 minute conversation via hangout, happy > to chat with you :) Might be fun to think through your prob together. > > -Doug > > On Tue, May 26, 2015 at 1:42 PM, Steven White <swhite4...@gmail.com> > wrote: > > > Hi Doug, > > > > I'm back to this topic. Unfortunately, due to my DB structer, and > business > > need, I will not be able to search against a single field (i.e.: using > > copyField). Thus, I have to use list of fields via "qf". Given this, I > > see you said above to use "tie=1.0" will that, more or less, address this > > scoring issue? Should "tie=1.0" be set on the request handler like so: > > > > <requestHandler name="/select" class="solr.SearchHandler"> > > <lst name="defaults"> > > <str name="echoParams">explicit</str> > > <int name="rows">20</int> > > <str name="defType">edismax</str> > > <str name="qf">F1 F2 F3 F4 ... ... ...</str> > > <float name="tie">1.0</float> > > <str name="fl">_UNIQUE_FIELD_,score</str> > > <str name="wt">xml</str> > > <str name="indent">true</str> > > </lst> > > </requestHandler> > > > > Or must "tie" be passed as part of the URL? > > > > Thanks > > > > Steve > > > > > > On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull < > > dturnb...@opensourceconnections.com> wrote: > > > > > Yeah a copyField into one could be a good space/time tradeoff. It can > be > > > more manageable to use an all field for both relevancy and performance, > > if > > > you can handle the duplication of data. > > > > > > You could set tie=1.0, which effectively sums all the matches instead > of > > > picking the best match. You'll still have cases where one field's score > > > might just happen to be far off of another, and thus dominating the > > > summation. But something easy to try if you want to keep playing with > > > dismax. > > > > > > -Doug > > > > > > On Wed, May 20, 2015 at 2:56 PM, Steven White <swhite4...@gmail.com> > > > wrote: > > > > > > > Hi Doug, > > > > > > > > Your blog write up on relevancy is very interesting, I didn't know > > this. > > > > Looks like I have to go back to my drawing board and figure out an > > > > alternative solution: somehow get those group-based-fields data into > a > > > > single field using copyField. > > > > > > > > Thanks > > > > > > > > Steve > > > > > > > > On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull < > > > > dturnb...@opensourceconnections.com> wrote: > > > > > > > > > Steven, > > > > > > > > > > I'd be concerned about your relevance with that many qf fields. > > Dismax > > > > > takes a "winner takes all" point of view to search. Field scores > can > > > vary > > > > > by an order of magnitude (or even two) despite the attempts of > query > > > > > normalization. You can read more here > > > > > > > > > > > > > > > > > > > > http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/ > > > > > > > > > > I'm about to win the "blashphemer" merit badge, but ad-hoc > all-field > > > like > > > > > searching over many fields is actually a good use case for > > > > Elasticsearch's > > > > > cross field queries. > > > > > > > > > > > > > > > > > > > > https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html > > > > > > > > > > > > > > > > > > > > http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/ > > > > > > > > > > It wouldn't be hard (and actually a great feature for the project) > to > > > get > > > > > the Lucene query associated with cross field search into Solr. You > > > could > > > > > easily write a plugin to integrate it into a query parser: > > > > > > > > > > > > > > > > > > > > https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java > > > > > > > > > > Hope that helps > > > > > -Doug > > > > > -- > > > > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource > > > Connections, > > > > > LLC | 240.476.9983 | http://www.opensourceconnections.com > > > > > Author: Relevant Search <http://manning.com/turnbull> from Manning > > > > > Publications > > > > > This e-mail and all contents, including attachments, is considered > to > > > be > > > > > Company Confidential unless explicitly stated otherwise, regardless > > > > > of whether attachments are marked as such. > > > > > On Wed, May 20, 2015 at 8:27 AM, Steven White < > swhite4...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > My solution requires that users in group-A can only search > against > > a > > > > set > > > > > of > > > > > > fields-A and users in group-B can only search against a set of > > > > fields-B, > > > > > > etc. There can be several groups, as many as 100 even more. To > > meet > > > > > this > > > > > > need, I build my search by passing in the list of fields via > "qf". > > > > What > > > > > > goes into "qf" can be large: as many as 1500 fields and each > field > > > name > > > > > > averages 15 characters long, in effect the data passed via "qf" > > will > > > be > > > > > > over 20K characters. > > > > > > > > > > > > Given the above, beside the fact that a search for "apple" > > > translating > > > > > to a > > > > > > 20K characters passing over the network, what else within Solr > and > > > > > Lucene I > > > > > > should be worried about if any? Will I hit some kind of a limit? > > > Will > > > > > > each search now require more CPU cycles? Memory? Etc. > > > > > > > > > > > > If the network traffic becomes an issue, my alternative solution > is > > > to > > > > > > create a /select handler for each group and in that handler list > > the > > > > > fields > > > > > > under "qf". > > > > > > > > > > > > I have considered creating pseudo-fields for each group and then > > use > > > > > > copyField into that group. During search, I than can "qf" > against > > > that > > > > > one > > > > > > field. Unfortunately, this is not ideal for my solution because > > the > > > > > fields > > > > > > that go into each group dynamically change (at least once a > month) > > > and > > > > > when > > > > > > they do change, I have to re-index everything (this I have to > > avoid) > > > to > > > > > > sync that group-field. > > > > > > > > > > > > I'm using "qf" with edismax and my Solr version is 5.1. > > > > > > > > > > > > Thanks > > > > > > > > > > > > Steve > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource > Connections, > > > LLC | 240.476.9983 | http://www.opensourceconnections.com > > > Author: Relevant Search <http://manning.com/turnbull> from Manning > > > Publications > > > This e-mail and all contents, including attachments, is considered to > be > > > Company Confidential unless explicitly stated otherwise, regardless > > > of whether attachments are marked as such. > > > > > > > > > -- > *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections, > LLC | 240.476.9983 | http://www.opensourceconnections.com > Author: Relevant Search <http://manning.com/turnbull> from Manning > Publications > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless > of whether attachments are marked as such. >