RE : amusing interaction between advanced tokenizers and highlighter

2004-06-20 Thread Rasik Pandey
> A question before I dive into coding a fix: can I assume (for > all analyzers) that the tokens produced by the tokenStream > have the following property: >currentToken.startOffset() >= lastToken.startOffset() > > The analyzers I have tested the highlighter with so far have > the property: >

Re: amusing interaction between advanced tokenizers and highlighter

2004-06-19 Thread markharw00d
A question before I dive into coding a fix: can I assume (for all analyzers) that the tokens produced by the tokenStream have the following property: currentToken.startOffset() >= lastToken.startOffset() The analyzers I have tested the highlighter with so far have the property: currentTok

Re: amusing interaction between advanced tokenizers and highlighter package

2004-06-19 Thread David Spencer
Erik Hatcher wrote: On Jun 19, 2004, at 2:29 AM, David Spencer wrote: A naive analyzer would turn something like "SyncThreadPool" into one token. Mine uses the great Lucene capability of Tokens being able to have a "0" position increment to turn it into the token stream: Sync (incr = 0) Thread

Re: amusing interaction between advanced tokenizers and highlighter package

2004-06-19 Thread David Spencer
[EMAIL PROTECTED] wrote: Yes, this issue has come up before with other choices of analyzers. I think it should be fixable without changing any of the highlighter APIs - can you email me or post here the source to your analyzer? Code attached - don't make fun of it please :) - very prelim. I thi

Re: amusing interaction between advanced tokenizers and highlighter package

2004-06-19 Thread Erik Hatcher
On Jun 19, 2004, at 2:29 AM, David Spencer wrote: A naive analyzer would turn something like "SyncThreadPool" into one token. Mine uses the great Lucene capability of Tokens being able to have a "0" position increment to turn it into the token stream: Sync (incr = 0) Thread (incr = 0) Pool (in

amusing interaction between advanced tokenizers and highlighter package

2004-06-18 Thread David Spencer
I've run across an amusing interaction between advanced Analyzers/TokenStreams and the very useful "term highlighter": http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/contributions/highlighter/ I have a custom Analyzer I'm using to index javadoc-generated web pages. The Analyzer in turn has