convert synonymsfilter to new tokenstream API
---------------------------------------------
Key: SOLR-1760
URL: https://issues.apache.org/jira/browse/SOLR-1760
Project: Solr
Issue Type: Task
Components: Schema and Analysis
Reporter: Robert Muir
This is the other non-trival tokenstream to convert to the new API. I looked at
this again today, and think I have a design where it will be nice and efficient.
if you have ideas or are already looking at it, please comment!! I havent
started coding and we shouldn't duplicate any efforts.
here is my current design:
* add a variable 'maximumContext' to SynonymMap. This is simply the maximum
singleMatch.size(), its the maximum number of tokens lookahead that is ever
needed.
* save/restoreState/cloning can be minimized by using a stack (fixed array of
maximumContext) of references to the SynonymMap submaps. This way we can
backtrack efficiently for multiword matches without save/restoreState and less
comparisons.
* two queues (can be fixed arrays of maximumContext) are needed still for
placing state objects. the first is those that have been evaluated (always
empty in the case of !preserveOriginal), and the second is those that havent
yet been evaluated, but are queued due to lookahead.
i plan on coding this up soon, if you have a better idea or have started work,
please comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.