On Nov 1, 2005, at 9:51 AM, Doug Cutting wrote:
Another approach might be to, instead of converting to UTF-8 to
strings right away, change things to convert lazily, if at all.
During index merging such conversion should never be needed.
!!
There ought to be some gains possible there, then.
Thanks for looking into this Marvin... very interesting stuff!
I haven't had a chance to review it in detail, but my gut tells me
that it should be able to be faster.
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
-
To u
Another approach might be to, instead of converting to UTF-8 to strings
right away, change things to convert lazily, if at all. During index
merging such conversion should never be needed. You needn't do this
systematically throughout Lucene, but only where it makes a big
difference. For exa
Marvin Humphrey wrote:
I think it's time to throw in the towel.
Please don't give up. I think you're quite close.
I would be careful using CharBuffer instead of char[] unless you're sure
all methods you call are very efficient. You could try avoiding
CharBuffer by adding something (ugly) l
I wrote:
I've got one more idea... time to try overriding readString and
writeString in BufferedIndexInput and BufferedIndexOutput, to take
advantage of buffers that are already there.
Too complicated to be worthwhile, it turns out. I think it's time to
throw in the towel. Frustrating,
On Oct 31, 2005, at 5:15 PM, Robert Engels wrote:
All of the JDK source is available via download from Sun.
Thanks. I believe the UTF-8 coding algos can be found in...
j2se > src > share > classes > sun > nio > cs > UTF_8.java
It looks like the translator methods have fairly high loop over
All of the JDK source is available via download from Sun.
-Original Message-
From: Marvin Humphrey [mailto:[EMAIL PROTECTED]
Sent: Monday, October 31, 2005 6:31 PM
To: java-dev@lucene.apache.org
Subject: Re: bytecount as String and prefix length
I wrote...
> I think I'll take
I wrote...
I think I'll take a crack at a custom charsToUTF8 converter algo.
Still no luck. Still 20% slower than the current implementation.
The algo is below, for reference.
It's entirely possible that my patches are doing something dumb
that's causing this, given my limited experien
I wrote...
Unfortunately, once the changes to TermBuffer, TermInfosWriter, and
StringHelper are applied, execution speed at index-time suffers a
slowdown of about 20%. Perhaps this can be blamed on all the calls
to getBytes("UTF-8") in TermInfosWriter? Maybe alternative
implementations
Greets,
I've been experimenting with using the UTF-8 bytecount as the VInt
count at the top of Lucene's string format, as was discussed back in
the "Lucene does NOT use UTF-8" thread. Changes were made to
IndexInput and IndexOutput as per some of Robert Engel's
suggestions. Here's the i
10 matches
Mail list logo