Re: Best Practises around relevance tuning per query

Ashwin Ramesh Wed, 26 Feb 2020 20:11:56 -0800

Hi everybody,

Thank you for all the amazing feedback. I apologize for the formatting of
my question.


I guess if I was to generalize my question, 'What is the most common
approaches to storing query level features in Solr documents?'

For example, a normalized_click_score is a document level feature, but how
would you scalably also do the same for specific queries? E.g. How do you
define, *For the query 'ipod' this specific document is very relevant*.

Thanks again!

Regards,

Ash

On Wed, Feb 19, 2020 at 6:14 PM Jörn Franke <jornfra...@gmail.com> wrote:

> You are too much focus on the solution. If you would describe the business
> case in more detail without including the solution itself more people could
> help.
>
> Eg it ie not clear why you have a scoring model and why this can address
> business needs.
>
> > Am 18.02.2020 um 01:50 schrieb Ashwin Ramesh <ash...@canva.com.invalid>:
> >
> > Hi,
> >
> > We are in the process of applying a scoring model to our search results.
> In
> > particular, we would like to add scores for documents per query and user
> > context.
> >
> > For example, we want to have a score from 500 to 1 for the top 500
> > documents for the query “dog” for users who speak US English.
> >
> > We believe it becomes infeasible to store these scores in Solr because we
> > want to update the scores regularly, and the number of scores increases
> > rapidly with increased user attributes.
> >
> > One solution we explored was to store these scores in a secondary data
> > store, and use this at Solr query time with a boost function such as:
> >
> > `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) …
> > mul(termfreq(id,'ID-500'),1)`
> >
> > We have over a hundred thousand documents in one Solr collection, and
> about
> > fifty million in another Solr collection. We have some queries for which
> > roughly 80% of the results match, although this is an edge case. We
> wanted
> > to know the worst case performance, so we tested with such a query. For
> > both of these collections we found the a message similar to the following
> > in the Solr cloud logs (tested on a laptop):
> >
> > Elapsed time: 5020. Exceeded allowed search time: 5000 ms.
> >
> > We then tried using the following boost, which seemed simpler:
> >
> > `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)`
> >
> > We then saw the following in the Solr cloud logs:
> >
> > `The request took too long to iterate over terms.`
> >
> > All responses above took over 5000 milliseconds to return.
> >
> > We are considering Solr’s re-ranker, but I don’t know how we would use
> this
> > without pushing all the query-context-document scores to Solr.
> >
> >
> > The alternative solution that we are currently considering involves
> > invoking multiple solr queries.
> >
> > This means we would make a request to solr to fetch the top N results
> (id,
> > score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar,
> limit=N.
> >
> > Another request would be made using a filter query with a set of doc ids
> > that we know are high value for the user’s query. E.g. q=*:*,
> > fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N.
> >
> > We would then do a reranking phase in our service layer.
> >
> > Do you have any suggestions for known patterns of how we can store and
> > retrieve scores per user context and query?
> >
> > Regards,
> > Ash & Spirit.
> >
> > --
> > **
> > ** <https://www.canva.com/>Empowering the world to design
> > Also, we're
> > hiring. Apply here! <https://about.canva.com/careers/>
> >
> > <https://twitter.com/canva> <https://facebook.com/canva>
> > <https://au.linkedin.com/company/canva> <https://twitter.com/canva>
> > <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>
> > <https://instagram.com/canva>
>

-- 
**
** <https://www.canva.com/>Empowering the world to design
Also, we're 
hiring. Apply here! <https://about.canva.com/careers/>
 
<https://twitter.com/canva> <https://facebook.com/canva> 
<https://au.linkedin.com/company/canva> <https://twitter.com/canva>  
<https://facebook.com/canva>  <https://au.linkedin.com/company/canva>  
<https://instagram.com/canva>

Re: Best Practises around relevance tuning per query

Reply via email to