[ 
https://issues.apache.org/jira/browse/LUCENE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089415#comment-13089415
 ] 

Olivier Favre commented on LUCENE-3392:
---------------------------------------

The proposed implementation may a have tight bond with the JVM implementation 
of some classes (StringReader, BufferedReader and FilterReader), as they rely 
on a named private field (respectively "str", "in" and "in").
This can be avoided, but any Reader should then be fully read and stored as a 
String or a char[], which can have a huge overhead.
Considering each clone would get read relatively at the same speed (well, only 
for word delimiting analysis, not for a KeywordAnalyzer) an implementation 
could only retain in memory the portion read by at least one cloned reader but 
not all clones, in order to implement a "multi read head" reader.

Another implementation would be to change the API to give a CloneableReader 
interface with a "giveAClone()" function instead of a Reader for tokenStream 
and reusableTokenStream functions.
But this involves massive refactoring (>13,000 lines) and introduces an 
important API break.

The proposed implementation is the best solution I found.
Any suggestions are welcome!

> Combining analyzers output
> --------------------------
>
>                 Key: LUCENE-3392
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3392
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Olivier Favre
>            Priority: Minor
>              Labels: analysis
>             Fix For: 3.4
>
>         Attachments: ComboAnalyzer-lucene3x.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It should be easy to combine the output of multiple Analyzers, or 
> TokenStreams.
> A ComboAnalyzer and a ComboTokenStream class would take multiple instances, 
> and multiplex their output, keeping a rough order of tokens like increasing 
> position then increasing start offset then increasing end offset.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to