If you haven’t already, might want to check out maximal marginal
relevance...original paper: Carbonell and Goldstein.

On Thu, Sep 27, 2018 at 7:29 PM Joel Bernstein <joels...@gmail.com> wrote:

> Yeah, I think your plan sounds fine.
>
> Do you have a specific use case for diversity of results. I've been
> wondering if diversity of results would provide better perceived relevance.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Sep 27, 2018 at 1:39 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> dceccarel...@bloomberg.net> wrote:
>
> > Yeah, I think Kmeans might be a way to implement the "top 3 stories that
> > are more distant", but you can also have a more naïve (and faster)
> strategy
> > like
> >  - sending a threshold
> >  - scan the documents according to the relevance score
> >  - select the top documents that have diversity > threshold.
> >
> > I would allow to define the strategy and select it from the request.
> >
> > From: solr-user@lucene.apache.org At: 09/27/18 18:25:43To:  Diego
> > Ceccarelli (BLOOMBERG/ LONDON ) ,  solr-user@lucene.apache.org
> > Subject: Re: solr and diversification
> >
> > I've thought about this problem a little bit. What I was considering was
> > using Kmeans clustering to cluster the top 50 docs, then pulling the top
> > scoring doc form each cluster as the top documents. This should be fast
> and
> > effective at getting diversity.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Thu, Sep 27, 2018 at 1:20 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> > dceccarel...@bloomberg.net> wrote:
> >
> > > Hi,
> > >
> > > I'm considering to write a component for diversifying the results. I
> know
> > > that diversification can be achieved by using grouping but I'm thinking
> > > about something different and query biased.
> > > The idea is to have something that gets applied after the normal
> > retrieval
> > > and selects the top k documents more diverse based on some distance
> > metric:
> > >
> > > Example:
> > > imagine that you are asking for 10 rows, and you set diversify.rows=3
> > > diversity.metric=tfidf  diversify.field=body
> > >
> > > Solr might retrieve the the top 10 rows as usual, extract tfidf vectors
> > > for the bodies and select the top 3 stories that are more distant
> > according
> > > to the cosine similarity.
> > > This would be different from grouping because documents will be
> > > 'collapsed' or not based on the subset of documents retrieved for the
> > > query.
> > > Do you think it would make sense to have it as a component?  any
> feedback
> > > / idea?
> > >
> > >
> > >
> >
> >
> >
>

Reply via email to