[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775192#action_12775192
 ] 

Michael Gundlach commented on SOLR-236:
---------------------------------------

I've found an NPE that occurs when performing quasi-distributed field 
collapsing.

My company only has one use case for field collapsing: collapsing on email 
address.  Our index is spread across multiple cores.  We found that if we shard 
by email address, so that a given all documents with a given email address are 
guaranteed to appear on the same core, then we can do distributed field 
collapsing.

We add &collapse.field=email and &shards=core1,core2,... to a regular query.  
Each core collapses on email and sends the results back to the requestor.  
Since no emails appear on more than one core, we've accomplished distributed 
search.  We do lose the <collapse_count> section, but that's not needed for our 
purpose -- we just need an accurate total document count, and to have no more 
than one document for a given email address in the results.

Unfortunately, this throws an NPE when searching on a tokenized field.  
Searching string fields is fine.  I don't understand exactly why the NPE 
appears, but I did bandaid over it by checking explicitly for nulls at the 
appropriate line in the code.  No more NPE.

There's a downside, which is that if we attempt to collapse on a field other 
than email -- one which has documents appearing in multiple cores -- the 
results are buggy: the first search returns few documents, and the number of 
documents actually displayed don't always match the "numFound" value.  Then 
upon refresh we get what we think is the correct numFound, and the correct list 
of documents.  This doesn't bother me too much, as you're guaranteed to get 
incorrect answers from the collapse code anyway when collapsing on a field that 
you didn't use as your key for sharding.

In the spirit of Yonik's law of patches, I have made two imperfect patches 
attempting to contribute the fix, or at least point out the error:

1. I pulled trunk, applied the latest SOLR-236 patch, made my 2 line change, 
and created a patch file.  The resultant patch file looks very different from 
the latest SOLR-236 patchfile, so I assume I did something wrong.

2. I pulled trunk, made my 2 line change, and created another patch file.  This 
file is tiny but of course is missing all of the field collapsing changes.

Would you like me to post either of these patchfiles to this issue?  Or is it 
sufficient to just tell you that the NPE occured in QueryComponent.java on line 
556? ("rb._responseDocs.set(sdoc.positionInResponse, doc);" where sdoc was 
null.)  Perhaps my use case is extraordinary enough that you're happy leaving 
the NPE in place and telling other users to not do what I'm doing?

Thanks!
Michael

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
> field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
> SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to