[ https://issues.apache.org/jira/browse/SOLR-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829097#action_12829097 ]
Robert Muir commented on SOLR-1670: ----------------------------------- bq. Not at the semantic level (for overlapping tokens). Another way to look at it is that a tokenstream is just a sequence of tokens, and posInc is just another attribute. your description of semantics makes sense in terms of how it is used by the indexer, but the order of these tokens can matter if someone uses a custom tokenfilter, it might matter for some custom attributes, and it might matter for a different consumer, its different behavior. i have made an effort to preserve all the behavior of all these tokenstreams when converting to the new api. I really don't want to break anything. > synonymfilter/map repeat bug > ---------------------------- > > Key: SOLR-1670 > URL: https://issues.apache.org/jira/browse/SOLR-1670 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis > Affects Versions: 1.4 > Reporter: Robert Muir > Assignee: Yonik Seeley > Attachments: SOLR-1670.patch, SOLR-1670.patch, SOLR-1670_test.patch > > > as part of converting tests for SOLR-1657, I ran into a problem with > synonymfilter > the test for 'repeats' has a flaw, it uses this assertTokEqual construct > which does not really validate that two lists of token are equal, it just > stops at the shorted one. > {code} > // repeats > map.add(strings("a b"), tokens("ab"), orig, merge); > map.add(strings("a b"), tokens("ab"), orig, merge); > assertTokEqual(getTokList(map,"a b",false), tokens("ab")); > /* in reality the result from getTokList is ab ab ab!!!!! */ > {code} > when converted to assertTokenStreamContents this problem surfaced. attached > is an additional assertion to the existing testcase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.