Re: PositionLengthAttribute

2013-09-07 Thread Benson Margulies
In Japanese, compounds are just decompositions of the input string. In other languages, compounds can manufacture entire tokens from thin air. In those cases, it's something of a question how to decide on the offsets. I think that you're right, eventually, insofar as there's some offset in the

Re: PositionLengthAttribute

2013-09-07 Thread Robert Muir
On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies ben...@basistech.com wrote: In Japanese, compounds are just decompositions of the input string. In other languages, compounds can manufacture entire tokens from thin air. In those cases, it's something of a question how to decide on the offsets.

Re: PositionLengthAttribute

2013-09-07 Thread Benson Margulies
On Sat, Sep 7, 2013 at 8:39 AM, Robert Muir rcm...@gmail.com wrote: On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies ben...@basistech.com wrote: In Japanese, compounds are just decompositions of the input string. In other languages, compounds can manufacture entire tokens from thin air. In

Strange performance of Lucene 4.4.0

2013-09-07 Thread Mirko Sertic
Hi@all I am getting strange performance measures on Lucene 4.4.0, maybe someone can explain this: The following syntax leads to pretty slow queries on my machine(16ms for every execution): theSearcher.search(theQuery, null, theSearcher.getIndexReader().maxDoc()); but the following syntax

Re: LookaheadTokenFilter

2013-09-07 Thread Benson Margulies
nextToken() calls peekToken(). That seems to prevent my lookahead processing from seeing that item later. Am I missing something? On Fri, Sep 6, 2013 at 9:15 PM, Benson Margulies ben...@basistech.com wrote: I think that the penny just dropped, and I should not be using this class. If I call

Re: LookaheadTokenFilter

2013-09-07 Thread Michael McCandless
Something is wrong; I'm not sure what offhand, but calling peekToken 10 times should not stack all tokens @ position 0; it should stack the tokens at the positions where they occurred. Are you sure the posIncr att is sometimes 1 (i.e., the position is in fact moving forward for some tokens)?

Re: LookaheadTokenFilter

2013-09-07 Thread Benson Margulies
I think I had better build you a test case for this situation, and attach it to a JIRA. On Sat, Sep 7, 2013 at 3:33 PM, Michael McCandless luc...@mikemccandless.com wrote: Something is wrong; I'm not sure what offhand, but calling peekToken 10 times should not stack all tokens @ position 0; it

Re: LookaheadTokenFilter

2013-09-07 Thread Michael McCandless
That would be awesome, thanks! Mike McCandless http://blog.mikemccandless.com On Sat, Sep 7, 2013 at 3:40 PM, Benson Margulies ben...@basistech.com wrote: I think I had better build you a test case for this situation, and attach it to a JIRA. On Sat, Sep 7, 2013 at 3:33 PM, Michael

Re: LookaheadTokenFilter

2013-09-07 Thread Benson Margulies
LUCENE-5202. It seems to show the problem of the extra peek. I'm still struggling to make sense of the 'problem' of not always calling afterPosition(); that may be entirely my own confusion. On Sat, Sep 7, 2013 at 4:21 PM, Michael McCandless luc...@mikemccandless.com wrote: That would be

Re: LookaheadTokenFilter

2013-09-07 Thread Michael McCandless
Thanks Benson, I'll have a look. Mike McCandless http://blog.mikemccandless.com On Sat, Sep 7, 2013 at 4:33 PM, Benson Margulies ben...@basistech.com wrote: LUCENE-5202. It seems to show the problem of the extra peek. I'm still struggling to make sense of the 'problem' of not always calling