ore processing when calculating the character count, but that's
a one-liner, right?
-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
ssing most list
readers aren't too interested in the on-going discussion. If anybody
else would like to be copied, send me an email.
-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200
--
ta. It's
only the above two edge cases that create an interoperability problem.
-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
..U+DFFF is defined as the
range for the low (least significant) surrogate.
-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On Aug 28, 2005, at 11:42 PM, Ken Krugler wrote:
I'm not familiar with UTF-8 enough to follow the details of this
discussion. I hope other Lucene developers are, so we can resolve this
issue anyone raising a hand?
I could, but recent posts makes me think this is heading towa
;t think this test data exists, unfortunately. But it
shouldn't be too hard to generate.
-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Ken Krugler wrote:
The remaining issue is dealing with old-format indexes.
I think that revving the version number on the segments file would
be a good start. This file must be read before any others. Its
current version is -1 and would become -2. (All positive values are
version 0, for
On Monday 29 August 2005 19:56, Ken Krugler wrote:
"Lucene writes strings as a VInt representing the length of the
string in Java chars (UTF-16 code units), followed by the character
data."
But wouldn't UTF-16 mean 2 bytes per character?
Yes, UTF-16 means two bytes p
D]
Sent: Monday, August 29, 2005 4:24 PM
To: java-dev@lucene.apache.org
Subject: Re: Lucene does NOT use UTF-8.
Ken Krugler wrote:
The remaining issue is dealing with old-format indexes.
I think that revving the version number on the segments file would be a
good start. This file must be read be
Daniel Naber wrote:
On Monday 29 August 2005 19:56, Ken Krugler wrote:
"Lucene writes strings as a VInt representing the length of the
string in Java chars (UTF-16 code units), followed by the character
data."
But wouldn't UTF-16 mean 2 bytes per character? That doesn
BMP
characters.
Thanks,
-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
emason.org/lucene_reports_2005092001.tar.gz
[3] -
http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200509.mbox/[EMAIL
PROTECTED]
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200
-
To uns
ing written contains an embedded null or an extended
(not in the BMP) Unicode code point.
c. Old code is then used to read the index.
It may still make sense to defer this change to 2.0, but it's not at
the level of changing the format of an index file.
-- Ken
-
ut I if we could find a small team (and of
course, a lead), I would love to contribute ...
> Sebastian.
--
Ken Krugler
Krugle, Inc.
+1 530-470-9200
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
ucene, I found some posts to the mailing list a while back, but
nothing definitive.
FWIW, my experience w/Eclipse 3.1 was that trying to auto-create
Eclipse projects using the Ant build file didn't work very well. So
we wound up manually creating the project, setting up the classpath,
-dev-h...@lucene.apache.org
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can
code data license and its
compatibility with Apache 2.0.
Does anybody know whether http://www.unicode.org/copyright.html
creates an issue? What's the process for vetting a license? Or is
this something I should be posting to a different list?
Thanks,
-- Ken
an then respond to your specific question.
-- Ken
--
Ken Krugler
+1 530-210-6378
OI:
http://www.krugle.org/kse/files/svn/svn.apache.org/poi/src/java/org/apache/poi/poifs/storage/HeaderBlockReader.java
On line 83.
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"
--
ked at the code, and the bug isn't obvious. Plus I worry
about the probability of introducing a new bug with any modification.
If anybody who's touched this code has time to look at the issue and
comment, that would be great!
Thanks,
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
&
gt;[EMAIL PROTECTED]
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>[EMAIL PROTECTED]
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"
[
https://issues.apache.org/jira/browse/LUCENE-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622432#action_12622432
]
Ken Krugler commented on LUCENE-1343:
-
Hi Robert,
FWIW, the issues being discu
[
https://issues.apache.org/jira/browse/LUCENE-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622746#action_12622746
]
Ken Krugler commented on LUCENE-1343:
-
Hi Robert,
So given that you and the Uni
[
https://issues.apache.org/jira/browse/LUCENE-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786712#action_12786712
]
Ken Krugler commented on LUCENE-1343:
-
Just to make sure this point doesn't
[
https://issues.apache.org/jira/browse/LUCENE-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804285#action_12804285
]
Ken Krugler commented on LUCENE-826:
I think Nutch (and eventually Mahout) plan to
25 matches
Mail list logo