What do you mean with "*these limitations" *Do you want to make multiple
grouping at same time?


2013/9/6 Sai Gadde <gadde....@gmail.com>

> Thank you Jack for the suggestion.
>
> We can try group by site. But considering that number of sites are only
> about 1000 against the index size of 5 million, One can expect most of the
> hits would be hidden and for certain specific keywords only a handful of
> actual results could be displayed if results are grouped by site.
>
> we already group on a signature field to identify duplicate content in
> these 5 million+ docs. But here the number of duplicates are only about
> 3-5% maximum.
>
> Is there any workaround for these limitations with grouping?
>
> Thanks
> Shyam
>
>
>
> On Thu, Sep 5, 2013 at 9:16 PM, Jack Krupansky <j...@basetechnology.com
> >wrote:
>
> > The grouping (field collapsing) feature somewhat addresses this - group
> by
> > a "site" field and then if more than one or a few top pages are from the
> > same site they get grouped or collapsed so that you can see more sites
> in a
> > few results.
> >
> > See:
> > http://wiki.apache.org/solr/**FieldCollapsing<
> http://wiki.apache.org/solr/FieldCollapsing>
> > https://cwiki.apache.org/**confluence/display/solr/**Result+Grouping<
> https://cwiki.apache.org/confluence/display/solr/Result+Grouping>
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Sai Gadde
> > Sent: Thursday, September 05, 2013 2:27 AM
> > To: solr-user@lucene.apache.org
> > Subject: Tweaking boosts for more search results variety
> >
> >
> > Our index is aggregated content from various sites on the web. We want
> good
> > user experience by showing multiple sites in the search results. In our
> > setup we are seeing most of the results from same site on the top.
> >
> > Here is some information regarding queries and schema
> >                site - String field. We have about 1000 sites in index
> >                sitetype - String field.  we have 3 site types
> > omitNorms="true" for both the fields
> >
> > Doc count varies largely based on site and sitetype by a factor of 10 -
> > 1000 times
> > Total index size is about 5 million docs.
> > Solr Version: 4.0
> >
> > In our queries we have a fixed and preferential boost for certain sites.
> > sitetype has different and fixed boosts for 3 possible values. We turned
> > off Inverse Document Frequency (IDF) for these boosts to work properly.
> > Other text fields are boosted based on search keywords only.
> >
> > With this setup we often see a bunch of hits from a single site followed
> by
> > next etc.,
> > Is there any solution to see results from variety of sites and still keep
> > the preferential boosts in place?
> >
>

Reply via email to