[ 
https://issues.apache.org/jira/browse/SOLR-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495213
 ] 

Yonik Seeley commented on SOLR-234:
-----------------------------------

offsets point back to the original field value for a particular token... and to 
me, it's a semantic contract (point to what makes sense in the source). It's 
not limited to the offsets generated by the Tokenizer... Analyzers don't have 
to use Tokenizers and TokenFilters at all.

As an example, WordDelimiterFilter modifies offsets when it splits words, and 
that makese sense to me.

Another way to think about it is that there is more than one way to solve a 
problem (construct an analyzer).
What matters is the tokens that come out the end... not if I did
a) a tokenizer that split on something followed by a filter that trimmed
vs
b) a tokenizer that managed to split on something including discarding the 
whitespace

For this specific case, I think it comes down to the likely usecases for the 
filter, and an argument could be made either way.  I'm fine with either as this 
is a very minor issue.

> TrimFilter should update the start and end offsets
> --------------------------------------------------
>
>                 Key: SOLR-234
>                 URL: https://issues.apache.org/jira/browse/SOLR-234
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Ryan McKinley
>            Priority: Minor
>         Attachments: SOLR-234-TrimFilterOffsets.patch, 
> SOLR-234-TrimFilterOffsets.patch
>
>
> As implemented, the TrimFilter only trims the text.  It does not update the 
> the startOffset and endOffset
> see:
> http://www.nabble.com/TrimFilter----t.startOffset%28%29%2C-t.endOffset%28%29-tf3728875.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to