Thank you Jack for the suggestion.

We can try group by site. But considering that number of sites are only
about 1000 against the index size of 5 million, One can expect most of the
hits would be hidden and for certain specific keywords only a handful of
actual results could be displayed if results are grouped by site.

we already group on a signature field to identify duplicate content in
these 5 million+ docs. But here the number of duplicates are only about
3-5% maximum.

Is there any workaround for these limitations with grouping?

Thanks
Shyam



On Thu, Sep 5, 2013 at 9:16 PM, Jack Krupansky <j...@basetechnology.com>wrote:

> The grouping (field collapsing) feature somewhat addresses this - group by
> a "site" field and then if more than one or a few top pages are from the
> same site they get grouped or collapsed so that you can see more sites in a
> few results.
>
> See:
> http://wiki.apache.org/solr/**FieldCollapsing<http://wiki.apache.org/solr/FieldCollapsing>
> https://cwiki.apache.org/**confluence/display/solr/**Result+Grouping<https://cwiki.apache.org/confluence/display/solr/Result+Grouping>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Sai Gadde
> Sent: Thursday, September 05, 2013 2:27 AM
> To: solr-user@lucene.apache.org
> Subject: Tweaking boosts for more search results variety
>
>
> Our index is aggregated content from various sites on the web. We want good
> user experience by showing multiple sites in the search results. In our
> setup we are seeing most of the results from same site on the top.
>
> Here is some information regarding queries and schema
>                site - String field. We have about 1000 sites in index
>                sitetype - String field.  we have 3 site types
> omitNorms="true" for both the fields
>
> Doc count varies largely based on site and sitetype by a factor of 10 -
> 1000 times
> Total index size is about 5 million docs.
> Solr Version: 4.0
>
> In our queries we have a fixed and preferential boost for certain sites.
> sitetype has different and fixed boosts for 3 possible values. We turned
> off Inverse Document Frequency (IDF) for these boosts to work properly.
> Other text fields are boosted based on search keywords only.
>
> With this setup we often see a bunch of hits from a single site followed by
> next etc.,
> Is there any solution to see results from variety of sites and still keep
> the preferential boosts in place?
>

Reply via email to