[jira] Created: (SOLR-1760) convert synonymsfilter to new tokenstream API

Robert Muir (JIRA) Fri, 05 Feb 2010 11:28:53 -0800

convert synonymsfilter to new tokenstream API
---------------------------------------------


                 Key: SOLR-1760
                 URL: https://issues.apache.org/jira/browse/SOLR-1760
             Project: Solr
          Issue Type: Task
          Components: Schema and Analysis
            Reporter: Robert Muir


This is the other non-trival tokenstream to convert to the new API. I looked at 
this again today, and think I have a design where it will be nice and efficient.

if you have ideas or are already looking at it, please comment!! I havent 
started coding and we shouldn't duplicate any efforts.

here is my current design:

* add a variable 'maximumContext' to SynonymMap. This is simply the maximum 
singleMatch.size(), its the maximum number of tokens lookahead that is ever 
needed.
* save/restoreState/cloning can be minimized by using a stack (fixed array of 
maximumContext) of references to the SynonymMap submaps. This way we can 
backtrack efficiently for multiword matches without save/restoreState and less 
comparisons.
* two queues (can be fixed arrays of maximumContext) are needed still for 
placing state objects. the first is those that have been evaluated (always 
empty in the case of !preserveOriginal), and the second is those that havent 
yet been evaluated, but are queued due to lookahead. 

i plan on coding this up soon, if you have a better idea or have started work, 
please comment.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1760) convert synonymsfilter to new tokenstream API

Reply via email to