[
https://issues.apache.org/jira/browse/SOLR-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560069#action_12560069
]
Mike Klaas commented on SOLR-461:
---------------------------------
The max characters thing is directly from the lucene contrib highlighter. It
is based on token offset, so it counts whitespace, and doesn't cut of a token
in the middle.
It also analogous to the RegexFragmenter's maxAnalyzedChars parameter, which
can't be token-based.
I'm not sure it is wise to add two apis with virtually the same functionality.
Anyone who wants to set a high limit will have to set both.
However, it might be nice to make the token filter a pluggable component, so
that users can insert this token filter if they want.
> Highlighting TokenStream Truncation capability
> ----------------------------------------------
>
> Key: SOLR-461
> URL: https://issues.apache.org/jira/browse/SOLR-461
> Project: Solr
> Issue Type: Improvement
> Components: highlighter
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
>
> It is sometimes the case when generating snippets that one need not
> fragment/analyze the whole document (especially for large documents) in order
> to show meaningful snippet highlights.
> Patch to follow that adds a counting TokenFilter that returns null after X
> number of Tokens have been seen. This filter will then be hooked into the
> SolrHighlighter and configurable via solrconfig.xml. The default value will
> be Integer.MAX_VALUE or, I suppose, it could be set to whatever Max Field
> Length is set to, as well.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.