[jira] Resolved: (LUCENE-1333) Token implementation needs improvements

2008-08-20 Thread Michael McCandless (JIRA)
patch -- almost 10K lines. I sure hope we didn't break anything :) > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-1333 >

[jira] Assigned: (LUCENE-1333) Token implementation needs improvements

2008-08-19 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1333: -- Assignee: Michael McCandless > Token implementation needs improveme

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-18 Thread Michael McCandless (JIRA)
new patch with tiny changes: * Fixed javadoc warnings * Removed extra unnecessary comparison in ArrayUtil.getShrinkSize I plan to commit in a day or two! > Token implementation needs improvements > --- > >

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-18 Thread DM Smith (JIRA)
sableToken); nextToken != null; nextToken = stream.next(reusableToken)) { Do something with nextToken } {code} * Improved Payload.clone() to avoid new if possible. * Removed last remaining calls to termText() in a *.jj file. > Token implementation needs

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-13 Thread Michael McCandless (JIRA)
ontent as another. There are a couple of places that this is appropriate. I like this, but maybe name it reinit(Token)? bq. Also, are the reinit methods used? I think we could use it in further places (I didn't search exhaustively). DM if you can pull together a patch w/ these fi

[jira] Issue Comment Edited: (LUCENE-1333) Token implementation needs improvements

2008-08-13 Thread DM Smith (JIRA)
(termBuffer, 0, termLength); return code; } Also, are the reinit methods used? If not, I'd like to work up a patch that uses them. (And I'll include the above in it.) I'll probably add copyFrom(Token) as a means to initialize one token to have the same content as another

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-13 Thread DM Smith (JIRA)
f places that this is appropriate. > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-1333 > Project: Lucene - Java >

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-13 Thread Grant Ingersoll (JIRA)
ssue with clone in the old Token was that it had to do the initTermBuffer before, even though clone already knows what size the buffer should be, etc. It looks like this is fixed now, so it may be a non-issue. > Token implementation needs impro

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-13 Thread Doron Cohen (JIRA)
ed in the constructor, or a special setter can be added for that, such as: disableRest(). There would not be an enableReset(). When disabled, next() would not clone, and reset() will throw an exception. > Token implementation needs improvements > --- > >

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-13 Thread DM Smith (JIRA)
ved more than once. Two personal opinions: * Caches that don't implement reset should return cache.remove(0) [or equivalent] so it is clear that the cache can only be used once. * Caches should not be used except when it gives a clear performance advantage. > Token implementat

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-13 Thread Grant Ingersoll (JIRA)
bad. Does look like we improved some of the cloning costs, though. > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-1333 >

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-13 Thread Doron Cohen (JIRA)
, the consumer that invoked step 2 might have overridden the tokens it got from step 2. Now in step 4 sink "thinks" it returns clones of original tokens, unaware that the consumer of step 2 modified them. > Token implem

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-13 Thread Grant Ingersoll (JIRA)
the Analysis process, which I would argue does not violate the reusability process. Cloning Tokens is not cheap, as I recall. In fact, I seem to recall testing that it was cheaper to do a new. Now, maybe that is fixed in this issue, but I am not sure. > Token implementation needs impro

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-12 Thread DM Smith (JIRA)
change it and thus the internal list changes. This is only a problem when reset is implemented. This is because a single token can be returned more than once from the TokenStream. If SinkTokenizer did not implement reset, then next() would not need to create a clone. > Token implementation n

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-12 Thread Grant Ingersoll (JIRA)
27;t seem right to me. The token is already cloned in the add() method, no need to clone again in the next() call. At least, that's my understanding based on trying to piece together the changes being proposed here. > Token implementation n

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-12 Thread Doron Cohen (JIRA)
er clean, all contrib tests passed! > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-1333 > Project: Lucene - Java >

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-12 Thread Doron Cohen (JIRA)
all tests passed. Let me check recent patch. > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-1333 > Project: Lucene - Java >

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1333: --- Attachment: LUCENE-1333.patch Woops, you're right -- new patch attached. &g

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-12 Thread Doron Cohen (JIRA)
and all seems correct to me. One tiny thing in Token - in two locations there's this code: {code} setTermBuffer(newTermBuffer, newTermOffset, newTermLength); setTermLength(newTermLength); {code} Second call is redundant since first call already sets termLength. > Token implemen

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-12 Thread Michael McCandless (JIRA)
& cloning * Added partial clone method, that clones all except replaces termBuffer & start/end offset; fixed NGramTokenFilter and CompoundWordTokenFilterBase to use this method * Fixed bug in PrecedenceQueryParser.jj that became obvious once I ran javacc I think we are close!

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-12 Thread Michael McCandless (JIRA)
javacc to run cleanly on PrecedenceQueryParser.jj. I figured this out: you have to use javacc 3.2 (not 4.0 or beyond) for contrib/misc, then "ant javacc" runs fine (with the patch from LUCENE-1353). > Token implementation needs improvements > --

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-12 Thread Doron Cohen (JIRA)
nges in their generate java files. While it would be good to fix the generation problem, you can compare the jj and java pairs to see that they match. {quote} {quote} I ended up doing the same. {quote} There's a patch for this in LUCENE-1353 ... > Token implementation needs

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-11 Thread DM Smith (JIRA)
and is not resilient to future fields. > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-1333 > Project: Lucene - Java

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1333: --- Attachment: LUCENE-1333.patch Attached new patch w/ above changes. > To

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-11 Thread Michael McCandless (JIRA)
p; start/end offsets you've passed in? Analagous to reinit, but first making a cloned token. > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-13

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-11 Thread Michael McCandless (JIRA)
o be good to finish the canonical implementation and provide hashcode? OK I'll add. bq. Perhaps an assert in a producer is a good idea. I find it easier to debug a failed assert than a null pointer exception. Agreed, I'll add. > Token i

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-11 Thread Doron Cohen (JIRA)
ionIncrement()); assertEquals(tokFlags, newtok.getFlags()); assertEquals(tokPayload, newtok.getPayload()); } {code} > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jir

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-11 Thread Doron Cohen (JIRA)
ought it a bug but no, patch is correct. * I find the comment in TokenFilter about implementing next() vs. next(Token) confusing. Perhaps use here the same comment as in Tokenizer. > Token implementation needs improvements > --- > >

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread DM Smith (JIRA)
hey match. I agree that Token next(Token) should assume a non-null. Perhaps an assert in a producer is a good idea. I find it easier to debug a failed assert than a null pointer exception. > Token implementation needs improvements > --- > >

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Doron Cohen (JIRA)
applies cleanly and I'm running the test now, so far all looks good but my PC is not that fast and it will take some more time to complete. I hope to review later tonight or tomorrow morning. > Token implementation needs improvements > --- &

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Michael McCandless (JIRA)
Payloads.equals -- this fixed TestSingleTokenTokenFilter. * Switched to the "final reusableToken" pattern. * Added to TokenStream.next javadoc stating that parameter should never be null * Fixed test failures (all tests should pass now) > Token implementation needs

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Michael McCandless (JIRA)
to see if there are any other bugs! These equals() methods are tricky! > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-1333 >

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Doron Cohen (JIRA)
can't see that bug... :-) For PrecedenceQueryParser, Is there a way to use ant to run javacc on contrib/misc? > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apach

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Michael McCandless (JIRA)
for Token.equals() in case you didn't write that part yet Thanks Doron; I'll merge into my version :) Both your version and my version had bugs, so it's great you posted yours! And I hope the merged result has no bugs :) > Token implementat

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Michael McCandless (JIRA)
src/java/org/apache/lucene/queryParser/precedence/PrecedenceQueryParser.jj . . . org.javacc.parser.ParseException: Encountered "<<" at line 658, column 3. Was expecting one of: ... "<" ... Detected 1 errors and 0 warnings. {code} Does anyone else hit

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Doron Cohen (JIRA)
p;& subEqual(payload, other.payload); } private boolean subEqual(Object o1, Object o2) { if (o1==null) return o2==null; return o1.equals(o2); } {code} > Token implementation needs improvements > --- > > Ke

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Doron Cohen (JIRA)
k in backwards compatibility, but I think it's OK? {quote} I think so. I feel good about this whole change, it will make reuse chain more clear, especially once all deprecations can be removed. > Token implementation needs improvements > --- > >

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Michael McCandless (JIRA)
ould implement equals(). I agree. This is technically a break in backwards compatibility, but I think it's OK? > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apach

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Doron Cohen (JIRA)
I wonder how this test ever passed before... Oh I see it now, - trunk's of SingleTokenTokenStream never called Token.clone() while patched version calls it twice. So definitely, Token should implement equals(). > Token implementation needs improvements > ---

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Michael McCandless (JIRA)
s in TokenStream.nextToken(Token)'s javadocs. I agree; I'll do this in next patch iteration... > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apach

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Doron Cohen (JIRA)
not null. Not so nice. So yes, I agree with you, just need to clarify this in TokenStream.nextToken(Token)'s javadocs. > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://is

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Michael McCandless (JIRA)
Hmm, indeed I do see this too -- but this is because Token has never overridden "equals" right? > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apach

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Doron Cohen (JIRA)
into the failure of TestSingleTokenTokenFilter I saw that if these lines are executed: {code} Token t1 = new Token(); Token t2 = t1.clone(); boolean isEqual = t1.equals(t2) {code} Surprisingly, *isEqual = false* Do you see this too? > Token implementation needs impro

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Michael McCandless (JIRA)
kens probably should check and create a token if it is so. But I don't think that is what they do now.) Actually I think one should never pass null to next(Token) API -- ie a source token stream need not check for null. > Token implementation needs

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Michael McCandless (JIRA)
were lost in ChineseTokenizer. OK, I'll fold this into my changes to the patch. Thanks Doron! > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Michael McCandless (JIRA)
the test failures. > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-1333 > Project: Lucene - Java > Issue Type: Imp

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Michael McCandless (JIRA)
ode-smell standpoint I'd still like to use the "single re-use" pattern when applicable. DM I'll make this change & post a new patch. > Token implementation needs improvements > --- > > Key: LU

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Doron Cohen (JIRA)
t in ChineseTokenizer. Adding this in flush(Token) will fix it. {code} token.setStartOffset(start); token.setEndOffset(start+length); {code} Possibly true for other places where new Token() that takes start/end offsets was replaced by clear() and setTermBuffer(). > Token implementation needs impro

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-10 Thread Doron Cohen (JIRA)
are failing though - TestChineseTokenizer for one. I'm looking into it, might be that a problem is in the test. > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apach

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-09 Thread Doron Cohen (JIRA)
d the token it just returned? > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-1333 > Project: Lucene - Java &g

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-09 Thread DM Smith (JIRA)
ode} final Token reusableToken = new Token(); for (Token token = stream.next(reusableToken); token != null; token = stream.next(reusableToken)) { {code} > Token implementation needs improvements > --- > > Key: LUCENE-1333 >

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-09 Thread Michael McCandless (JIRA)
for (Token next = stream.next(result); next != null; next = stream.next(result)) { {code} > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LU

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-08 Thread Michael McCandless (JIRA)
all together under this issue, I think we should commit this one instead of LUCENE-1350. I'll review the [massive] patch -- thanks DM! > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: h

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-08 Thread DM Smith (JIRA)
functionality solving LUCENE-1350. If this patch is applied before LUCENE-1350, then that issue is resolved. If it is done after then the patch will need to be rebuilt. I did not do the "reuse" API mentioned in LUCENE-1350. > Token implementation needs

[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
appears to be general purpose array manipulation. > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-1333 > Project: L

[jira] Issue Comment Edited: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
l of core and contrib. This patch replaces the prior two. > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-1333 >

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1333: - Attachment: LUCENE-1333-xml-query-parser.patch for contrib xml-query-parser > Token implementat

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1333: - Attachment: LUCENE-1333-wordnet.patch for contrib wordnet > Token implementation needs improveme

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1333: - Attachment: LUCENE-1333-memory.patch for contrib memory > Token implementation needs improveme

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1333: - Attachment: LUCENE-1333-queries.patch for contrib queries > Token implementation needs improveme

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1333: - Attachment: LUCENE-1333-miscellaneous.patch for contrib misc > Token implementation needs improveme

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1333: - Attachment: LUCENE-1333-lucli.patch for contrib lucli > Token implementation needs improveme

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1333: - Attachment: LUCENE-1333-wikipedia.patch for contrib wikipedia > Token implementation ne

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1333: - Attachment: LUCENE-1333-instantiated.patch For contrib instantiated > Token implementation ne

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1333: - Attachment: LUCENE-1333-highlighter.patch for contrib highlighter > Token implementation ne

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1333: - Attachment: LUCENE-1333-snowball.patch for contrib snowball > Token implementation needs improveme

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1333: - Attachment: LUCENE-1333-analyzers.patch This is for contrib analyzers. > Token implementation ne

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
.l.analysis but not the files contained in LUCENE-1333.patch > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-1333 > Project: Lu

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1333: - Attachment: LUCENE-1333-core.patch This patch covers the rest of core Lucene. > Token implementat

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-08-04 Thread DM Smith (JIRA)
replaces the prior two. > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: https://issues.apache.org/jira/browse/LUCENE-1333 > Project: Lucene - Java > Is

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-07-30 Thread Michael McCandless (JIRA)
ents in Token.growTermBuffer to first handle the [I think most frequent] case where termBuffer is already allocated. * Javadoc/whitespace > Token implementation needs improvements > --- > > Key: LUCENE-1333 > URL: htt

Re: Token implementation

2008-07-15 Thread Karl Wettin
15 jul 2008 kl. 11.25 skrev Hiroaki Kawai: DM Smith <[EMAIL PROTECTED]> wrote: In core and contrib, there are times where more than one token is used at a time. In a few places they are put into collections. So a singleton wouldn't work. I'd suggest a factory method to get a thread local

Re: Token implementation

2008-07-15 Thread Hiroaki Kawai
DM Smith <[EMAIL PROTECTED]> wrote: > Hiroaki Kawai wrote: > > DM Smith <[EMAIL PROTECTED]> wrote: > > > >> On Jul 11, 2008, at 9:42 PM, Hiroaki Kawai wrote: > >> > >> > >>> Another suggestion from me: > >>> How about making token object as an singleton? > >>> > >> Would that work fo

[jira] Updated: (LUCENE-1333) Token implementation needs improvements

2008-07-14 Thread DM Smith (JIRA)
x27;d like to get a nod of whether I need to make further changes to the supplied patch before doing #4) 4. Changing of the remainder of core and contrib to remove calls to deprecated Token and TokenStream methods, i.e. to use the reuse API. > Token implementation needs impr

Re: Token implementation

2008-07-14 Thread DM Smith
Hiroaki Kawai wrote: DM Smith <[EMAIL PROTECTED]> wrote: On Jul 11, 2008, at 9:42 PM, Hiroaki Kawai wrote: Another suggestion from me: How about making token object as an singleton? Would that work for a multi-threaded application? Of cource. We should make that thread l

Re: Token implementation

2008-07-12 Thread Hiroaki Kawai
DM Smith <[EMAIL PROTECTED]> wrote: > On Jul 11, 2008, at 9:42 PM, Hiroaki Kawai wrote: > > > Another suggestion from me: > > How about making token object as an singleton? > > Would that work for a multi-threaded application? Of cource. We should make that thread local singleton. > > > > > >

Re: Token implementation

2008-07-12 Thread DM Smith
Michael McCandless wrote: But, in TokenFilter, next() should be deprecated, IMHO. I think this is a good idea. After all if people don't want to bother using the passed in Token, they are still allowed to return a new one. I'm looking into the deprecation of TokenStream.next(). I ha

Re: Token implementation

2008-07-12 Thread DM Smith
This would be better, especially when we get to Java 5, with covariant returns. Then we could also rename termBuffer() to term() (or just leave it :). DM Michael McCandless wrote: Or we could leave termText() deprecated, add term() which does the same thing sub-optimally (ie, always creates

Re: Token implementation

2008-07-12 Thread Michael McCandless
Or we could leave termText() deprecated, add term() which does the same thing sub-optimally (ie, always creates new String from the byte[]), and in the javadocs for termText() state that you can migrate either term() (if you really want a String and you understand the performance cost of

Re: Token implementation

2008-07-11 Thread Hiroaki Kawai
Another suggestion from me: How about making token object as an singleton? > Maybe we should un-deprecate the termText() method but add javadocs > explaining that for better performance you should use the char[] reuse > methods instead? > > Mike > > DM Smith wrote: > > > Michael McCandless

Re: Token implementation

2008-07-11 Thread DM Smith
On Jul 11, 2008, at 9:42 PM, Hiroaki Kawai wrote: Another suggestion from me: How about making token object as an singleton? Would that work for a multi-threaded application? Maybe we should un-deprecate the termText() method but add javadocs explaining that for better performance you s

Re: Token implementation

2008-07-11 Thread Hiroaki Kawai
Another suggestion from me: How about making token object as an singleton? > Maybe we should un-deprecate the termText() method but add javadocs > explaining that for better performance you should use the char[] reuse > methods instead? > > Mike > > DM Smith wrote: > > > Michael McCandless

Re: Token implementation

2008-07-11 Thread DM Smith
Michael McCandless wrote: Maybe we should un-deprecate the termText() method but add javadocs explaining that for better performance you should use the char[] reuse methods instead? I think so, too. Should we leave it as deprecated until 3.0? With the performance note and the encouragement to

Re: Token implementation

2008-07-11 Thread Michael McCandless
Maybe we should un-deprecate the termText() method but add javadocs explaining that for better performance you should use the char[] reuse methods instead? Mike DM Smith wrote: Michael McCandless wrote: DM Smith wrote: Shouldn't Term have constructors that take a Token? I think tha

Re: Token implementation

2008-07-11 Thread DM Smith
Michael McCandless wrote: DM Smith wrote: Shouldn't Term have constructors that take a Token? I think that makes sense, though normally Token appears during analysis and Term during searching (I think?) -- how often would you need to make a Term from a Token? The problem I'm addressing

Re: Token implementation

2008-07-11 Thread Michael McCandless
DM Smith wrote: Shouldn't Term have constructors that take a Token? I think that makes sense, though normally Token appears during analysis and Term during searching (I think?) -- how often would you need to make a Term from a Token? Mike --

[jira] Created: (LUCENE-1333) Token implementation needs improvements

2008-07-11 Thread DM Smith (JIRA)
Token implementation needs improvements --- Key: LUCENE-1333 URL: https://issues.apache.org/jira/browse/LUCENE-1333 Project: Lucene - Java Issue Type: Improvement Components: Analysis

Re: Token implementation

2008-07-11 Thread DM Smith
n't dig into it, but in the core, are reusable Tokens per document or per DocumentWriter. If it is the former then the copy becomes more significant. If it is the latter, then it is almost moot, as you point out. It's actually per DocumentsWriter per thread per field. This is e

Re: Token implementation

2008-05-20 Thread DM Smith
document or per DocumentWriter. If it is the former then the copy becomes more significant. If it is the latter, then it is almost moot, as you point out. It's actually per DocumentsWriter per thread per field. This is excellent. More below More responses below: DM Smit

Re: Token implementation

2008-05-20 Thread DM Smith
On May 20, 2008, at 12:50 AM, Hiroaki Kawai wrote: "Michael McCandless" <[EMAIL PROTECTED]> wrote: More responses below: DM Smith <[EMAIL PROTECTED]> wrote: But, in TokenFilter, next() should be deprecated, IMHO. I think this is a good idea. After all if people don't want to

Re: Token implementation

2008-05-20 Thread Michael McCandless
Hiroaki Kawai wrote: "Michael McCandless" <[EMAIL PROTECTED]> wrote: I agree the situation is not ideal, and it's confusing. This comes back to LUCENE-969. At the time, we decided to keep both String & char[] only to avoid performance cost for those analyzer chains that use String tokens ex

Re: Token implementation

2008-05-20 Thread Michael McCandless
he copy becomes more significant. If it is the latter, then it is almost moot, as you point out. It's actually per DocumentsWriter per thread per field. More below More responses below: DM Smith <[EMAIL PROTECTED]> wrote: I think the Token implementation as it stands ca

Re: Token implementation

2008-05-19 Thread Hiroaki Kawai
"Michael McCandless" <[EMAIL PROTECTED]> wrote: > I agree the situation is not ideal, and it's confusing. > > This comes back to LUCENE-969. > > At the time, we decided to keep both String & char[] only to avoid > performance cost for those analyzer chains that use String tokens > exclusively.

Re: Token implementation

2008-05-19 Thread Hiroaki Kawai
"Michael McCandless" <[EMAIL PROTECTED]> wrote: > I agree the situation is not ideal, and it's confusing. > > This comes back to LUCENE-969. > > At the time, we decided to keep both String & char[] only to avoid > performance cost for those analyzer chains that use String tokens > exclusively.

Re: Token implementation

2008-05-19 Thread DM Smith
icant. If it is the latter, then it is almost moot, as you point out. More below.... More responses below: DM Smith <[EMAIL PROTECTED]> wrote: I think the Token implementation as it stands can use some improvement and I'd be willing to do it. I'd like some input, though.

Re: Token implementation

2008-05-19 Thread Michael McCandless
(tbLength > 0 && newSize > tbLength) { > int size = tbLength; > while (size < newSize) { > size *= 2; > } >newSize = size; > } > > // Check to see if the buffer needs to be resized > if (newSize > tbLength) > { > termBuffer = new cha

Re: Token implementation

2008-05-19 Thread DM Smith
; tbLength) { int size = tbLength; while (size < newSize) { size *= 2; } newSize = size; } // Check to see if the buffer needs to be resized if (newSize > tbLength) { termBuffer = new char[newSize]; } } More below More responses below: DM Smith <[EMAIL PROTECT

Re: Token implementation

2008-05-19 Thread Michael McCandless
y char[] today) is compelling, but I worry about the performance hit to legacy analyzer chains... More responses below: DM Smith <[EMAIL PROTECTED]> wrote: > I think the Token implementation as it stands can use some improvement and > I'd be willing to do it. I'd like some input

Token implementation

2008-05-18 Thread DM Smith
I think the Token implementation as it stands can use some improvement and I'd be willing to do it. I'd like some input, though. Especially because it is core to Lucene. I've been working on eliminating deprecations from my user code and I ran across Token.getText() as b