Finally I get it working. It seems that latest SOLR-236-trunk.patch just have some bugs.
I checked out an older revision of solr trunk - rev 899572 (dtd. 2010-01-15) from http://svn.apache.org/repos/asf/lucene/solr/trunk and applied SOLR-236.patch dtd. 2010-02-01. And collapsing works fine. I get correct numFound values after collapsing. Maybe this can help someone. 2010/5/13 Sergey Shinderuk <sshinde...@gmail.com>: > Joe, thanks for your answer. But it doesn't solve my problem. Below I > gave a longer description of my problem. > > First of all, I checked out solr trunk revision 928303 with last > change dtd. 2010-03-28. Then I applied the latest patch from SOLR-236 > to get field collapsing component. After that I built the example > configuration with 'ant example'. > > Then I started to experiment with field collapsing: > > 1. Query all docs http://localhost:8983/solr/select?q=*:* > ... > <result name="response" numFound="19" start="0"> > ... > There are 19 documents in the index. > > > 2. Same with faceting by manu_exact field: > http://localhost:8983/solr/select?q=*:*&facet=on&facet.field=manu_exact > ... > <lst name="facet_fields"> > <lst name="manu_exact"> > <int name="Corsair Microsystems Inc.">4</int> > <int name="A-DATA Technology Inc.">2</int> > <int name="Apache Software Foundation">2</int> > <int name="Belkin">2</int> > <int name="Canon Inc.">2</int> > <int name="ASUS Computer Inc.">1</int> > <int name="ATI Technologies">1</int> > <int name="Apple Computer Inc.">1</int> > <int name="Dell, Inc.">1</int> > <int name="Maxtor Corp.">1</int> > <int name="Samsung Electronics Co. Ltd.">1</int> > <int name="ViewSonic Corp.">1</int> > </lst> > </lst> > ... > > I got 12 distinct facets. > > > 3. Now collapsing by manu_exact instead of faceting > http://localhost:8983/solr/select?q=*:*&collapse.field=manu_exact > > I get collapse counts for the first 10 rows having distinct manu_exact > values. But the problem is that i get an odd numFound: > > <result name="response" numFound="10" start="0"> > > numFound is equal to the number of rows returned by solr. (In fact, if > I add rows=3 to the query string, then I get numFound=3.) > And I want to get numFound = 12, because there are 12 distinct values > in the index for manu_exact field as demonstrated in p. 2. > > > > Joe suggested adding a dummy field with a sole value of 1 and > performimg faceting on this field over *uncollapsed* result set > > http://localhost:8983/solr/select?q=*:*&collapse.field=manu_exact&collapse.facet=after&facet=on&facet.field=dummy&rows=3 > > And I get numFound = 10 as before and facet count = 19 for the sole > value of dummy field. And this is the expected result, but not what I > want. > > > I thought that my question is the one faced immediately if one uses > field collapsing. If you don't know the total number of results, then > you cannot paginate through them, at least you don't know the number > of pages. > > In my application I'm trying to collapse near-duplicate documents > based on document signature. And I need to know how many non-duplicate > results hit the query. > > > Any help appreciated. > > > 2010/5/12 Joe Calderon <calderon....@gmail.com>: >> dont know if its the best solution but i have a field i facet on >> called type its either 0,1, combined with collapse.facet=before i just >> sum all the values of the facet field to get the total number found >> >> if you dont have such a field u can always add a field with a single value >> >> --joe >> >> On Wed, May 12, 2010 at 10:41 AM, Sergey Shinderuk <sshinde...@gmail.com> >> wrote: >>> Hi, fellows! >>> >>> I use field collapsing to collapse near-duplicate documents based on >>> document fuzzy signature calculated at index time. >>> The problem is that, when field collapsing is enabled, in query >>> response numFound is equal to the number of rows requested. >>> >>> For instance, with solr example schema i can issue the following query >>> >>> http://localhost:8983/solr/select?q=*:*&rows=3&collapse.field=manu_exact >>> >>> In response i get collapse_counts together with ordinary result list, >>> but numFound equals 3. >>> As far as I understand, this is due to the way field collapsing works. >>> >>> I want to show the total number of hits to the user and provide a >>> pagination through the results. >>> >>> Any ideas? >>> >>> Regards, >>> Sergey Shinderuk >>> >> >