[
https://issues.apache.org/jira/browse/SOLR-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854983#action_12854983
]
Robert Muir commented on SOLR-1869:
-----------------------------------
bq. this all started because the highlighter was highlighting a term at the
same offsets twice,
Perhaps we should fix this directly in DefaultSolrHighlighter? It already has
this TokenStream-sorting filter thats intended to do the following:
{code}
/** Orders Tokens in a window first by their startOffset ascending.
* endOffset is currently ignored.
* This is meant to work around fickleness in the highlighter only. It
* can mess up token positions and should not be used for indexing or querying.
*/
{code}
Maybe the deduplication logic should occur here after it sorts on startOffset?
> RemoveDuplicatesTokenFilter doest have expected behaviour
> ---------------------------------------------------------
>
> Key: SOLR-1869
> URL: https://issues.apache.org/jira/browse/SOLR-1869
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Reporter: Joe Calderon
> Priority: Minor
> Attachments: RemoveDupOffsetTokenFilter.java,
> RemoveDupOffsetTokenFilterFactory.java, SOLR-1869.patch
>
>
> the RemoveDuplicatesTokenFilter seems broken as it initializes its map and
> attributes at the class level and not within its constructor
> in addition i would think the expected behaviour would be to remove identical
> terms with the same offset positions, instead it looks like it removes
> duplicates based on position increment which wont work when using it after
> something like the edgengram filter. when i posted this to the mailing list
> even erik hatcher seemed to think thats what this filter was supposed to do...
> attaching a patch that has the expected behaviour and initializes variables
> in constructor
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.