Re: When is too many fields in "qf" is too many?

Steven White Tue, 26 May 2015 13:50:17 -0700

Thanks Doug.  I might have to take you on the hangout offer.  Let me refine
the requirement further and if I still see the need, I will let you know.


Steve

On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> How you have tie is fine. Setting tie to 1 might give you reasonable
> results. You could easily still have scores that are just always an order
> of magnitude or two higher, but try it out!
>
> BTW Anything you put in teh URL can also be put into a request handler.
>
> If you ever just want to have a 15 minute conversation via hangout, happy
> to chat with you :) Might be fun to think through your prob together.
>
> -Doug
>
> On Tue, May 26, 2015 at 1:42 PM, Steven White <swhite4...@gmail.com>
> wrote:
>
> > Hi Doug,
> >
> > I'm back to this topic.  Unfortunately, due to my DB structer, and
> business
> > need, I will not be able to search against a single field (i.e.: using
> > copyField).  Thus, I have to use list of fields via "qf".  Given this, I
> > see you said above to use "tie=1.0" will that, more or less, address this
> > scoring issue?  Should "tie=1.0" be set on the request handler like so:
> >
> >   <requestHandler name="/select" class="solr.SearchHandler">
> >      <lst name="defaults">
> >        <str name="echoParams">explicit</str>
> >        <int name="rows">20</int>
> >        <str name="defType">edismax</str>
> >        <str name="qf">F1 F2 F3 F4 ... ... ...</str>
> >        <float name="tie">1.0</float>
> >        <str name="fl">_UNIQUE_FIELD_,score</str>
> >        <str name="wt">xml</str>
> >        <str name="indent">true</str>
> >      </lst>
> >   </requestHandler>
> >
> > Or must "tie" be passed as part of the URL?
> >
> > Thanks
> >
> > Steve
> >
> >
> > On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull <
> > dturnb...@opensourceconnections.com> wrote:
> >
> > > Yeah a copyField into one could be a good space/time tradeoff. It can
> be
> > > more manageable to use an all field for both relevancy and performance,
> > if
> > > you can handle the duplication of data.
> > >
> > > You could set tie=1.0, which effectively sums all the matches instead
> of
> > > picking the best match. You'll still have cases where one field's score
> > > might just happen to be far off of another, and thus dominating the
> > > summation. But something easy to try if you want to keep playing with
> > > dismax.
> > >
> > > -Doug
> > >
> > > On Wed, May 20, 2015 at 2:56 PM, Steven White <swhite4...@gmail.com>
> > > wrote:
> > >
> > > > Hi Doug,
> > > >
> > > > Your blog write up on relevancy is very interesting, I didn't know
> > this.
> > > > Looks like I have to go back to my drawing board and figure out an
> > > > alternative solution: somehow get those group-based-fields data into
> a
> > > > single field using copyField.
> > > >
> > > > Thanks
> > > >
> > > > Steve
> > > >
> > > > On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull <
> > > > dturnb...@opensourceconnections.com> wrote:
> > > >
> > > > > Steven,
> > > > >
> > > > > I'd be concerned about your relevance with that many qf fields.
> > Dismax
> > > > > takes a "winner takes all" point of view to search. Field scores
> can
> > > vary
> > > > > by an order of magnitude (or even two) despite the attempts of
> query
> > > > > normalization. You can read more here
> > > > >
> > > > >
> > > >
> > >
> >
> http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/
> > > > >
> > > > > I'm about to win the "blashphemer" merit badge, but ad-hoc
> all-field
> > > like
> > > > > searching over many fields is actually a good use case for
> > > > Elasticsearch's
> > > > > cross field queries.
> > > > >
> > > > >
> > > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html
> > > > >
> > > > >
> > > >
> > >
> >
> http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/
> > > > >
> > > > > It wouldn't be hard (and actually a great feature for the project)
> to
> > > get
> > > > > the Lucene query associated with cross field search into Solr. You
> > > could
> > > > > easily write a plugin to integrate it into a query parser:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java
> > > > >
> > > > > Hope that helps
> > > > > -Doug
> > > > > --
> > > > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource
> > > Connections,
> > > > > LLC | 240.476.9983 | http://www.opensourceconnections.com
> > > > > Author: Relevant Search <http://manning.com/turnbull> from Manning
> > > > > Publications
> > > > > This e-mail and all contents, including attachments, is considered
> to
> > > be
> > > > > Company Confidential unless explicitly stated otherwise, regardless
> > > > > of whether attachments are marked as such.
> > > > > On Wed, May 20, 2015 at 8:27 AM, Steven White <
> swhite4...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > My solution requires that users in group-A can only search
> against
> > a
> > > > set
> > > > > of
> > > > > > fields-A and users in group-B can only search against a set of
> > > > fields-B,
> > > > > > etc.  There can be several groups, as many as 100 even more.  To
> > meet
> > > > > this
> > > > > > need, I build my search by passing in the list of fields via
> "qf".
> > > > What
> > > > > > goes into "qf" can be large: as many as 1500 fields and each
> field
> > > name
> > > > > > averages 15 characters long, in effect the data passed via "qf"
> > will
> > > be
> > > > > > over 20K characters.
> > > > > >
> > > > > > Given the above, beside the fact that a search for "apple"
> > > translating
> > > > > to a
> > > > > > 20K characters passing over the network, what else within Solr
> and
> > > > > Lucene I
> > > > > > should be worried about if any?  Will I hit some kind of a limit?
> > > Will
> > > > > > each search now require more CPU cycles?  Memory?  Etc.
> > > > > >
> > > > > > If the network traffic becomes an issue, my alternative solution
> is
> > > to
> > > > > > create a /select handler for each group and in that handler list
> > the
> > > > > fields
> > > > > > under "qf".
> > > > > >
> > > > > > I have considered creating pseudo-fields for each group and then
> > use
> > > > > > copyField into that group.  During search, I than can "qf"
> against
> > > that
> > > > > one
> > > > > > field.  Unfortunately, this is not ideal for my solution because
> > the
> > > > > fields
> > > > > > that go into each group dynamically change (at least once a
> month)
> > > and
> > > > > when
> > > > > > they do change, I have to re-index everything (this I have to
> > avoid)
> > > to
> > > > > > sync that group-field.
> > > > > >
> > > > > > I'm using "qf" with edismax and my Solr version is 5.1.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Steve
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource
> Connections,
> > > LLC | 240.476.9983 | http://www.opensourceconnections.com
> > > Author: Relevant Search <http://manning.com/turnbull> from Manning
> > > Publications
> > > This e-mail and all contents, including attachments, is considered to
> be
> > > Company Confidential unless explicitly stated otherwise, regardless
> > > of whether attachments are marked as such.
> > >
> >
>
>
>
> --
> *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
> LLC | 240.476.9983 | http://www.opensourceconnections.com
> Author: Relevant Search <http://manning.com/turnbull> from Manning
> Publications
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>

Re: When is too many fields in "qf" is too many?

Reply via email to