[ 
https://issues.apache.org/jira/browse/SOLR-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854972#action_12854972
 ] 

Joe Calderon commented on SOLR-1869:
------------------------------------

"at the same position and Term text as the previous token " is ambiguous, i 
assumed position to mean same start and end offsets, hence i assumed there was 
a bug.

i changed the filter to use CharArraySet, there was already a call to 
previous.clear() in reset(). Since the filter name is different i attached its 
accompanying factory.

this all started because the highlighter was highlighting a term at the same 
offsets twice, for example if i had a word with a synonym [ex-con,0,6] and 
[excon,0,5]  then ran it through edgengram filter i would end up with two 
tokens [ex, 0,2] with different position increments, the highlighted snippet 
was then "<em>ex</em><em>ex</em>-con", i posted this on the mailing list and  
RemoveDuplicatesTokenFilter was suggested.

> RemoveDuplicatesTokenFilter doest have expected behaviour
> ---------------------------------------------------------
>
>                 Key: SOLR-1869
>                 URL: https://issues.apache.org/jira/browse/SOLR-1869
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Joe Calderon
>            Priority: Minor
>         Attachments: RemoveDupOffsetTokenFilter.java, 
> RemoveDupOffsetTokenFilterFactory.java, SOLR-1869.patch
>
>
> the RemoveDuplicatesTokenFilter seems broken as it initializes its map and 
> attributes at the class level and not within its constructor
> in addition i would think the expected behaviour would be to remove identical 
> terms with the same offset positions, instead it looks like it removes 
> duplicates based on position increment which wont work when using it after 
> something like the edgengram filter. when i posted this to the mailing list 
> even erik hatcher seemed to think thats what this filter was supposed to do...
> attaching a patch that has the expected behaviour and initializes variables 
> in constructor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to