[ https://issues.apache.org/jira/browse/SOLR-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854972#action_12854972 ]
Joe Calderon commented on SOLR-1869: ------------------------------------ "at the same position and Term text as the previous token " is ambiguous, i assumed position to mean same start and end offsets, hence i assumed there was a bug. i changed the filter to use CharArraySet, there was already a call to previous.clear() in reset(). Since the filter name is different i attached its accompanying factory. this all started because the highlighter was highlighting a term at the same offsets twice, for example if i had a word with a synonym [ex-con,0,6] and [excon,0,5] then ran it through edgengram filter i would end up with two tokens [ex, 0,2] with different position increments, the highlighted snippet was then "<em>ex</em><em>ex</em>-con", i posted this on the mailing list and RemoveDuplicatesTokenFilter was suggested. > RemoveDuplicatesTokenFilter doest have expected behaviour > --------------------------------------------------------- > > Key: SOLR-1869 > URL: https://issues.apache.org/jira/browse/SOLR-1869 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis > Reporter: Joe Calderon > Priority: Minor > Attachments: RemoveDupOffsetTokenFilter.java, > RemoveDupOffsetTokenFilterFactory.java, SOLR-1869.patch > > > the RemoveDuplicatesTokenFilter seems broken as it initializes its map and > attributes at the class level and not within its constructor > in addition i would think the expected behaviour would be to remove identical > terms with the same offset positions, instead it looks like it removes > duplicates based on position increment which wont work when using it after > something like the edgengram filter. when i posted this to the mailing list > even erik hatcher seemed to think thats what this filter was supposed to do... > attaching a patch that has the expected behaviour and initializes variables > in constructor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.