Re: Best Practises around relevance tuning per query

Paras Lehana Wed, 26 Feb 2020 23:45:28 -0800

Hi Ashwin,

If I'm understanding your requirement correctly, I think you should read
about Payloads <https://lucidworks.com/post/solr-payloads/>.


On Thu, 27 Feb 2020 at 09:41, Ashwin Ramesh <ash...@canva.com.invalid>
wrote:

> Hi everybody,
>
> Thank you for all the amazing feedback. I apologize for the formatting of
> my question.
>
> I guess if I was to generalize my question, 'What is the most common
> approaches to storing query level features in Solr documents?'
>
> For example, a normalized_click_score is a document level feature, but how
> would you scalably also do the same for specific queries? E.g. How do you
> define, *For the query 'ipod' this specific document is very relevant*.
>
> Thanks again!
>
> Regards,
>
> Ash
>
> On Wed, Feb 19, 2020 at 6:14 PM Jörn Franke <jornfra...@gmail.com> wrote:
>
> > You are too much focus on the solution. If you would describe the
> business
> > case in more detail without including the solution itself more people
> could
> > help.
> >
> > Eg it ie not clear why you have a scoring model and why this can address
> > business needs.
> >
> > > Am 18.02.2020 um 01:50 schrieb Ashwin Ramesh <ash...@canva.com.invalid
> >:
> > >
> > > Hi,
> > >
> > > We are in the process of applying a scoring model to our search
> results.
> > In
> > > particular, we would like to add scores for documents per query and
> user
> > > context.
> > >
> > > For example, we want to have a score from 500 to 1 for the top 500
> > > documents for the query “dog” for users who speak US English.
> > >
> > > We believe it becomes infeasible to store these scores in Solr because
> we
> > > want to update the scores regularly, and the number of scores increases
> > > rapidly with increased user attributes.
> > >
> > > One solution we explored was to store these scores in a secondary data
> > > store, and use this at Solr query time with a boost function such as:
> > >
> > > `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) …
> > > mul(termfreq(id,'ID-500'),1)`
> > >
> > > We have over a hundred thousand documents in one Solr collection, and
> > about
> > > fifty million in another Solr collection. We have some queries for
> which
> > > roughly 80% of the results match, although this is an edge case. We
> > wanted
> > > to know the worst case performance, so we tested with such a query. For
> > > both of these collections we found the a message similar to the
> following
> > > in the Solr cloud logs (tested on a laptop):
> > >
> > > Elapsed time: 5020. Exceeded allowed search time: 5000 ms.
> > >
> > > We then tried using the following boost, which seemed simpler:
> > >
> > > `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)`
> > >
> > > We then saw the following in the Solr cloud logs:
> > >
> > > `The request took too long to iterate over terms.`
> > >
> > > All responses above took over 5000 milliseconds to return.
> > >
> > > We are considering Solr’s re-ranker, but I don’t know how we would use
> > this
> > > without pushing all the query-context-document scores to Solr.
> > >
> > >
> > > The alternative solution that we are currently considering involves
> > > invoking multiple solr queries.
> > >
> > > This means we would make a request to solr to fetch the top N results
> > (id,
> > > score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar,
> > limit=N.
> > >
> > > Another request would be made using a filter query with a set of doc
> ids
> > > that we know are high value for the user’s query. E.g. q=*:*,
> > > fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N.
> > >
> > > We would then do a reranking phase in our service layer.
> > >
> > > Do you have any suggestions for known patterns of how we can store and
> > > retrieve scores per user context and query?
> > >
> > > Regards,
> > > Ash & Spirit.
> > >
> > > --
> > > **
> > > ** <https://www.canva.com/>Empowering the world to design
> > > Also, we're
> > > hiring. Apply here! <https://about.canva.com/careers/>
> > >
> > > <https://twitter.com/canva> <https://facebook.com/canva>
> > > <https://au.linkedin.com/company/canva> <https://twitter.com/canva>
> > > <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>
> > > <https://instagram.com/canva>
> >
>
> --
> **
> ** <https://www.canva.com/>Empowering the world to design
> Also, we're
> hiring. Apply here! <https://about.canva.com/careers/>
>
> <https://twitter.com/canva> <https://facebook.com/canva>
> <https://au.linkedin.com/company/canva> <https://twitter.com/canva>
> <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>
> <https://instagram.com/canva>
>
>
>
>
>
>
>
>
>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>

Re: Best Practises around relevance tuning per query

Reply via email to