Finally I get it working. It seems that latest SOLR-236-trunk.patch
just have some bugs.

I checked out an older revision of solr trunk - rev 899572 (dtd.
2010-01-15) from  http://svn.apache.org/repos/asf/lucene/solr/trunk
and applied SOLR-236.patch dtd. 2010-02-01.

And collapsing works fine. I get correct numFound values after collapsing.

Maybe this can help someone.


2010/5/13 Sergey Shinderuk <sshinde...@gmail.com>:
> Joe, thanks for your answer. But it doesn't solve my problem. Below I
> gave a longer description of my problem.
>
> First of all, I checked out solr trunk revision 928303 with last
> change dtd. 2010-03-28. Then I applied the latest patch from SOLR-236
> to get field collapsing component. After that I built the example
> configuration with 'ant example'.
>
> Then I started to experiment with field collapsing:
>
> 1. Query all docs http://localhost:8983/solr/select?q=*:*
> ...
> <result name="response" numFound="19" start="0">
> ...
> There are 19 documents in the index.
>
>
> 2. Same with faceting by manu_exact field:
> http://localhost:8983/solr/select?q=*:*&facet=on&facet.field=manu_exact
> ...
> <lst name="facet_fields">
>  <lst name="manu_exact">
>    <int name="Corsair Microsystems Inc.">4</int>
>    <int name="A-DATA Technology Inc.">2</int>
>    <int name="Apache Software Foundation">2</int>
>    <int name="Belkin">2</int>
>    <int name="Canon Inc.">2</int>
>    <int name="ASUS Computer Inc.">1</int>
>    <int name="ATI Technologies">1</int>
>    <int name="Apple Computer Inc.">1</int>
>    <int name="Dell, Inc.">1</int>
>    <int name="Maxtor Corp.">1</int>
>    <int name="Samsung Electronics Co. Ltd.">1</int>
>    <int name="ViewSonic Corp.">1</int>
>  </lst>
> </lst>
> ...
>
> I got 12 distinct facets.
>
>
> 3. Now collapsing by manu_exact instead of faceting
> http://localhost:8983/solr/select?q=*:*&collapse.field=manu_exact
>
> I get collapse counts for the first 10 rows having distinct manu_exact
> values. But the problem is that i get an odd numFound:
>
> <result name="response" numFound="10" start="0">
>
> numFound is equal to the number of rows returned by solr. (In fact, if
> I add rows=3 to the query string, then I get numFound=3.)
> And I want to get numFound = 12, because there are 12 distinct values
> in the index for manu_exact field as demonstrated in p. 2.
>
>
>
> Joe suggested adding a dummy field with a sole value of 1 and
> performimg faceting on this field over *uncollapsed* result set
>
> http://localhost:8983/solr/select?q=*:*&collapse.field=manu_exact&collapse.facet=after&facet=on&facet.field=dummy&rows=3
>
> And I get numFound = 10 as before and facet count = 19 for the sole
> value of dummy field. And this is the expected result, but not what I
> want.
>
>
> I thought that my question is the one faced immediately if one uses
> field collapsing. If you don't know the total number of results, then
> you cannot paginate through them, at least you don't know the number
> of pages.
>
> In my application I'm trying to collapse near-duplicate documents
> based on document signature. And I need to know how many non-duplicate
> results hit the query.
>
>
> Any help appreciated.
>
>
> 2010/5/12 Joe Calderon <calderon....@gmail.com>:
>> dont know if its the best solution but i have a field i facet on
>> called type its either 0,1, combined with collapse.facet=before i just
>> sum all the values of the facet field to get the total number found
>>
>> if you dont have such a field u can always add a field with a single value
>>
>> --joe
>>
>> On Wed, May 12, 2010 at 10:41 AM, Sergey Shinderuk <sshinde...@gmail.com> 
>> wrote:
>>> Hi, fellows!
>>>
>>> I use field collapsing to collapse near-duplicate documents based on
>>> document fuzzy signature calculated at index time.
>>> The problem is that, when field collapsing is enabled, in query
>>> response numFound is equal to the number of rows requested.
>>>
>>> For instance, with solr example schema i can issue the following query
>>>
>>> http://localhost:8983/solr/select?q=*:*&rows=3&collapse.field=manu_exact
>>>
>>> In response i get collapse_counts together with ordinary result list,
>>> but numFound equals 3.
>>> As far as I understand, this is due to the way field collapsing works.
>>>
>>> I want to show the total number of hits to the user and provide a
>>> pagination through the results.
>>>
>>> Any ideas?
>>>
>>> Regards,
>>> Sergey Shinderuk
>>>
>>
>

Reply via email to