Joe, thanks for your answer. But it doesn't solve my problem. Below I
gave a longer description of my problem.

First of all, I checked out solr trunk revision 928303 with last
change dtd. 2010-03-28. Then I applied the latest patch from SOLR-236
to get field collapsing component. After that I built the example
configuration with 'ant example'.

Then I started to experiment with field collapsing:

1. Query all docs http://localhost:8983/solr/select?q=*:*
...
<result name="response" numFound="19" start="0">
...
There are 19 documents in the index.


2. Same with faceting by manu_exact field:
http://localhost:8983/solr/select?q=*:*&facet=on&facet.field=manu_exact
...
<lst name="facet_fields">
  <lst name="manu_exact">
    <int name="Corsair Microsystems Inc.">4</int>
    <int name="A-DATA Technology Inc.">2</int>
    <int name="Apache Software Foundation">2</int>
    <int name="Belkin">2</int>
    <int name="Canon Inc.">2</int>
    <int name="ASUS Computer Inc.">1</int>
    <int name="ATI Technologies">1</int>
    <int name="Apple Computer Inc.">1</int>
    <int name="Dell, Inc.">1</int>
    <int name="Maxtor Corp.">1</int>
    <int name="Samsung Electronics Co. Ltd.">1</int>
    <int name="ViewSonic Corp.">1</int>
  </lst>
</lst>
...

I got 12 distinct facets.


3. Now collapsing by manu_exact instead of faceting
http://localhost:8983/solr/select?q=*:*&collapse.field=manu_exact

I get collapse counts for the first 10 rows having distinct manu_exact
values. But the problem is that i get an odd numFound:

<result name="response" numFound="10" start="0">

numFound is equal to the number of rows returned by solr. (In fact, if
I add rows=3 to the query string, then I get numFound=3.)
And I want to get numFound = 12, because there are 12 distinct values
in the index for manu_exact field as demonstrated in p. 2.



Joe suggested adding a dummy field with a sole value of 1 and
performimg faceting on this field over *uncollapsed* result set

http://localhost:8983/solr/select?q=*:*&collapse.field=manu_exact&collapse.facet=after&facet=on&facet.field=dummy&rows=3

And I get numFound = 10 as before and facet count = 19 for the sole
value of dummy field. And this is the expected result, but not what I
want.


I thought that my question is the one faced immediately if one uses
field collapsing. If you don't know the total number of results, then
you cannot paginate through them, at least you don't know the number
of pages.

In my application I'm trying to collapse near-duplicate documents
based on document signature. And I need to know how many non-duplicate
results hit the query.


Any help appreciated.


2010/5/12 Joe Calderon <calderon....@gmail.com>:
> dont know if its the best solution but i have a field i facet on
> called type its either 0,1, combined with collapse.facet=before i just
> sum all the values of the facet field to get the total number found
>
> if you dont have such a field u can always add a field with a single value
>
> --joe
>
> On Wed, May 12, 2010 at 10:41 AM, Sergey Shinderuk <sshinde...@gmail.com> 
> wrote:
>> Hi, fellows!
>>
>> I use field collapsing to collapse near-duplicate documents based on
>> document fuzzy signature calculated at index time.
>> The problem is that, when field collapsing is enabled, in query
>> response numFound is equal to the number of rows requested.
>>
>> For instance, with solr example schema i can issue the following query
>>
>> http://localhost:8983/solr/select?q=*:*&rows=3&collapse.field=manu_exact
>>
>> In response i get collapse_counts together with ordinary result list,
>> but numFound equals 3.
>> As far as I understand, this is due to the way field collapsing works.
>>
>> I want to show the total number of hits to the user and provide a
>> pagination through the results.
>>
>> Any ideas?
>>
>> Regards,
>> Sergey Shinderuk
>>
>

Reply via email to