Re: LookaheadTokenFilter

2013-09-07 Thread Benson Margulies
nextToken() calls peekToken(). That seems to prevent my lookahead processing from seeing that item later. Am I missing something? On Fri, Sep 6, 2013 at 9:15 PM, Benson Margulies ben...@basistech.com wrote: I think that the penny just dropped, and I should not be using this class. If I call

Re: LookaheadTokenFilter

2013-09-07 Thread Michael McCandless
Something is wrong; I'm not sure what offhand, but calling peekToken 10 times should not stack all tokens @ position 0; it should stack the tokens at the positions where they occurred. Are you sure the posIncr att is sometimes 1 (i.e., the position is in fact moving forward for some tokens)?

Re: LookaheadTokenFilter

2013-09-07 Thread Benson Margulies
I think I had better build you a test case for this situation, and attach it to a JIRA. On Sat, Sep 7, 2013 at 3:33 PM, Michael McCandless luc...@mikemccandless.com wrote: Something is wrong; I'm not sure what offhand, but calling peekToken 10 times should not stack all tokens @ position 0; it

Re: LookaheadTokenFilter

2013-09-07 Thread Michael McCandless
That would be awesome, thanks! Mike McCandless http://blog.mikemccandless.com On Sat, Sep 7, 2013 at 3:40 PM, Benson Margulies ben...@basistech.com wrote: I think I had better build you a test case for this situation, and attach it to a JIRA. On Sat, Sep 7, 2013 at 3:33 PM, Michael

Re: LookaheadTokenFilter

2013-09-07 Thread Benson Margulies
LUCENE-5202. It seems to show the problem of the extra peek. I'm still struggling to make sense of the 'problem' of not always calling afterPosition(); that may be entirely my own confusion. On Sat, Sep 7, 2013 at 4:21 PM, Michael McCandless luc...@mikemccandless.com wrote: That would be

Re: LookaheadTokenFilter

2013-09-07 Thread Michael McCandless
Thanks Benson, I'll have a look. Mike McCandless http://blog.mikemccandless.com On Sat, Sep 7, 2013 at 4:33 PM, Benson Margulies ben...@basistech.com wrote: LUCENE-5202. It seems to show the problem of the extra peek. I'm still struggling to make sense of the 'problem' of not always calling

Re: LookaheadTokenFilter

2013-09-06 Thread Michael McCandless
It's in test-framework only because no real TokenFilter uses it yet, and, it's all very new code :) My intention was to eventually cutover tricky graph TokenFilters (like SynFilter), to simplify them, factoring out the common buffering of tokens by position into LookaheadTokenFilter, but I never

Re: LookaheadTokenFilter

2013-09-06 Thread Michael McCandless
On Thu, Sep 5, 2013 at 8:44 PM, Benson Margulies ben...@basistech.com wrote: I'm trying to work through the logic of reading ahead until I've seen marker for the end of a sentence, then applying some analysis to all of the tokens of the sentence, and then changing some attributes of each token

Re: LookaheadTokenFilter

2013-09-06 Thread Benson Margulies
On Fri, Sep 6, 2013 at 7:31 AM, Michael McCandless luc...@mikemccandless.com wrote: On Thu, Sep 5, 2013 at 8:44 PM, Benson Margulies ben...@basistech.com wrote: I'm trying to work through the logic of reading ahead until I've seen marker for the end of a sentence, then applying some analysis

Re: LookaheadTokenFilter

2013-09-06 Thread Benson Margulies
Michael, I'm apparently not fully deconfused yet. I've got a very simple incrementToken function. It calls peekToken to stack up the tokens. afterPosition is never called; I expected it to be called as each of the peeked tokens gets next-ed back out. I assume that I'm missing something simple.

Re: LookaheadTokenFilter

2013-09-06 Thread Benson Margulies
I think that the penny just dropped, and I should not be using this class. If I call peekToken 10 times while sitting at token 0, this class will stack up all 10 of these _at token position 0_. That's not really very helpful for what I'm doing. I need to borrow code from this class and not use

LookaheadTokenFilter

2013-09-05 Thread Benson Margulies
This useful-looking item is in the test-framework jar. Is there some subtle reason that it isn't in the common analyzer jar? Some reason why I'd regret using it?

LookaheadTokenFilter

2013-09-05 Thread Benson Margulies
I'm trying to work through the logic of reading ahead until I've seen marker for the end of a sentence, then applying some analysis to all of the tokens of the sentence, and then changing some attributes of each token to reflect the results. The queue of tokens for a position is just a State, so

Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

2010-01-29 Thread Jamie
Hi THere In the absense of documentation, I am trying to convert an EmailFilter class to Lucene 3.0. Its not working! Obviously, my understanding of the new token filter mechanism is misguided. Can someone in the know help me out for a sec and let me know where I am going wrong. Thanks.

Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

2009-09-03 Thread Daniel Shane
of a StackOverflow) : public class LookaheadTokenFilter extends TokenFilter { /** List of tokens that were peeked but not returned with next. */ LinkedListAttributeSource peekedTokens = new LinkedListAttributeSource(); /** The position of the next character that peek() will return

Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

2009-09-03 Thread Daniel Shane
be able to peek() a number of token forward before returning to usual behavior. Here is the implementation I had in mind (untested yet because of a StackOverflow) : public class LookaheadTokenFilter extends TokenFilter { /** List of tokens that were peeked but not returned with next

RE: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

2009-09-02 Thread Uwe Schindler
@lucene.apache.org Subject: Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter. This is what I had in mind (completely untested!): public class LookaheadTokenFilter extends TokenFilter { /** List of tokens that were

Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

2009-09-01 Thread Daniel Shane
Hi all! I'm trying to port my Lucene code to the new TokenStream API and I have a filter that I cannot seem to port using the current new API. The filter is called LookaheadTokenFilter. It behaves exactly like a normal token filter, except, you can call peek() and get information

Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

2009-09-01 Thread Daniel Shane
seem to port using the current new API. The filter is called LookaheadTokenFilter. It behaves exactly like a normal token filter, except, you can call peek() and get information on the next token in the stream. Since Lucene does not support stream rewinding, we did this by buffering tokens

Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

2009-09-01 Thread Michael Busch
code to the new TokenStream API and I have a filter that I cannot seem to port using the current new API. The filter is called LookaheadTokenFilter. It behaves exactly like a normal token filter, except, you can call peek() and get information on the next token in the stream. Since Lucene

Re: Lucene 2.9.0-rc2 [PROBLEM] : TokenStream API (incrementToken / AttributeSource), cannot implement a LookaheadTokenFilter.

2009-09-01 Thread Michael Busch
This is what I had in mind (completely untested!): public class LookaheadTokenFilter extends TokenFilter { /** List of tokens that were peeked but not returned with next. */ LinkedListAttributeSource.State peekedTokens = new LinkedListAttributeSource.State(); /** The position