[ https://issues.apache.org/jira/browse/SOLR-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495213 ]
Yonik Seeley commented on SOLR-234: ----------------------------------- offsets point back to the original field value for a particular token... and to me, it's a semantic contract (point to what makes sense in the source). It's not limited to the offsets generated by the Tokenizer... Analyzers don't have to use Tokenizers and TokenFilters at all. As an example, WordDelimiterFilter modifies offsets when it splits words, and that makese sense to me. Another way to think about it is that there is more than one way to solve a problem (construct an analyzer). What matters is the tokens that come out the end... not if I did a) a tokenizer that split on something followed by a filter that trimmed vs b) a tokenizer that managed to split on something including discarding the whitespace For this specific case, I think it comes down to the likely usecases for the filter, and an argument could be made either way. I'm fine with either as this is a very minor issue. > TrimFilter should update the start and end offsets > -------------------------------------------------- > > Key: SOLR-234 > URL: https://issues.apache.org/jira/browse/SOLR-234 > Project: Solr > Issue Type: Improvement > Reporter: Ryan McKinley > Priority: Minor > Attachments: SOLR-234-TrimFilterOffsets.patch, > SOLR-234-TrimFilterOffsets.patch > > > As implemented, the TrimFilter only trims the text. It does not update the > the startOffset and endOffset > see: > http://www.nabble.com/TrimFilter----t.startOffset%28%29%2C-t.endOffset%28%29-tf3728875.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.