[ https://issues.apache.org/jira/browse/LUCENE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089415#comment-13089415 ]
Olivier Favre commented on LUCENE-3392: --------------------------------------- The proposed implementation may a have tight bond with the JVM implementation of some classes (StringReader, BufferedReader and FilterReader), as they rely on a named private field (respectively "str", "in" and "in"). This can be avoided, but any Reader should then be fully read and stored as a String or a char[], which can have a huge overhead. Considering each clone would get read relatively at the same speed (well, only for word delimiting analysis, not for a KeywordAnalyzer) an implementation could only retain in memory the portion read by at least one cloned reader but not all clones, in order to implement a "multi read head" reader. Another implementation would be to change the API to give a CloneableReader interface with a "giveAClone()" function instead of a Reader for tokenStream and reusableTokenStream functions. But this involves massive refactoring (>13,000 lines) and introduces an important API break. The proposed implementation is the best solution I found. Any suggestions are welcome! > Combining analyzers output > -------------------------- > > Key: LUCENE-3392 > URL: https://issues.apache.org/jira/browse/LUCENE-3392 > Project: Lucene - Java > Issue Type: New Feature > Reporter: Olivier Favre > Priority: Minor > Labels: analysis > Fix For: 3.4 > > Attachments: ComboAnalyzer-lucene3x.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > It should be easy to combine the output of multiple Analyzers, or > TokenStreams. > A ComboAnalyzer and a ComboTokenStream class would take multiple instances, > and multiplex their output, keeping a rough order of tokens like increasing > position then increasing start offset then increasing end offset. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org