[jira] Commented: (SOLR-1869) RemoveDuplicatesTokenFilter doest have expected behaviour

Robert Muir (JIRA) Thu, 08 Apr 2010 09:35:58 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854983#action_12854983
 ]


Robert Muir commented on SOLR-1869:
-----------------------------------

bq. this all started because the highlighter was highlighting a term at the 
same offsets twice,

Perhaps we should fix this directly in DefaultSolrHighlighter? It already has 
this TokenStream-sorting filter thats intended to do the following:
{code}
/** Orders Tokens in a window first by their startOffset ascending.
 * endOffset is currently ignored.
 * This is meant to work around fickleness in the highlighter only.  It
 * can mess up token positions and should not be used for indexing or querying.
 */
{code}

Maybe the deduplication logic should occur here after it sorts on startOffset? 


> RemoveDuplicatesTokenFilter doest have expected behaviour
> ---------------------------------------------------------
>
>                 Key: SOLR-1869
>                 URL: https://issues.apache.org/jira/browse/SOLR-1869
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Joe Calderon
>            Priority: Minor
>         Attachments: RemoveDupOffsetTokenFilter.java, 
> RemoveDupOffsetTokenFilterFactory.java, SOLR-1869.patch
>
>
> the RemoveDuplicatesTokenFilter seems broken as it initializes its map and 
> attributes at the class level and not within its constructor
> in addition i would think the expected behaviour would be to remove identical 
> terms with the same offset positions, instead it looks like it removes 
> duplicates based on position increment which wont work when using it after 
> something like the edgengram filter. when i posted this to the mailing list 
> even erik hatcher seemed to think thats what this filter was supposed to do...
> attaching a patch that has the expected behaviour and initializes variables 
> in constructor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1869) RemoveDuplicatesTokenFilter doest have expected behaviour

Reply via email to