[
https://issues.apache.org/jira/browse/SOLR-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797161#action_12797161
]
Robert Muir commented on SOLR-1657:
-----------------------------------
Hello, I am working on WordDelimiterFilter and I have a question: how do we
want custom attributes to work here?
This affects performance of the filter under the new tokenstream API, as it
will determine when/if we have to save/restore state.
Here are two alternatives:
Alternative #1 (most performant): custom attributes from the original term will
only apply to words with no delimiters, or in the case of words with
delimiters, only the 'original' token output with the 'preserveOriginal'
option. This is easiest to understand in my opinion, and would perform the
best. Its arguable that if you split a term into 10 subwords, applying these
attributes to all 10 subwords may no longer make sense
Alternative #2: (least performant): custom attributes from the original term
will only apply to non-injected terms: this means if a word is split into 10
tokens, all 10 subword tokens, but not their concatenations, also have the
attributes derived from the original term. If preserveOriginal is on, then it
has the attributes also.
Alternative #3: ??? your ideas?
In my opinion, we should shoot for maximum performance, as I view this as
somewhat like a tokenizer, and custom attributes in general would be applied
after WDF, because trying to apply them before WDF and expecting them to make
sense afterwards will be confusing. but it does not matter really.
> convert the rest of solr to use the new tokenstream API
> -------------------------------------------------------
>
> Key: SOLR-1657
> URL: https://issues.apache.org/jira/browse/SOLR-1657
> Project: Solr
> Issue Type: Task
> Reporter: Robert Muir
> Attachments: SOLR-1657.patch, SOLR-1657.patch
>
>
> org.apache.solr.analysis:
> BufferedTokenStream
> -> -CommonGramsFilter-
> -> -CommonGramsQueryFilter-
> -> -RemoveDuplicatesTokenFilter-
> -CapitalizationFilterFactory-
> -HyphenatedWordsFilter-
> -LengthFilter (deprecated, remove)-
> SynonymFilter
> SynonymFilterFactory
> WordDelimiterFilter
> org.apache.solr.handler:
> AnalysisRequestHandler
> AnalysisRequestHandlerBase
> org.apache.solr.handler.component:
> QueryElevationComponent
> SpellCheckComponent
> org.apache.solr.highlight:
> DefaultSolrHighlighter
> org.apache.solr.search:
> FieldQParserPlugin
> org.apache.solr.spelling:
> SpellingQueryConverter
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.