http://java.sun.com/docs/books/tutorial/i18n/text/stream.html
Yes, its confusing. Sun calls its own encoding format as "Unicode" and the above
webpage talks about how to convert between Java's Unicode format and the UTF-8
format.
Its just a matter of specifying "UTF-8" when creating output strea
Thanks for pointing this out, Marvin. I wish Sun (or someone) would
document and register this particular character set encoding with
IANA, so that it could be used outside of Java. As it stands now,
it's essentially a bastard encoding, good for nothing, and one of the
warts of Java.
Lucene prob
I've delved into the matter of Lucene and UTF-8 a little further,
and I am discouraged by what I believe I've uncovered.
Lucene should not be advertising that it uses "standard UTF-8" -- or
even UTF-8 at all, since "Modified UTF-8" is _illegal_ UTF-8.
Unfortunately this is how Sun documents t
On Aug 26, 2005, at 10:14 PM, jian chen wrote:
Hi,
It seems to me that in theory, Lucene storage code could use true
UTF-8 to
store terms. Maybe it is just a legacy issue that the modified
UTF-8 is
used?
It has been suggested that this discussion should move to the
developer's list, s
Hi,
It seems to me that in theory, Lucene storage code could use true UTF-8 to
store terms. Maybe it is just a legacy issue that the modified UTF-8 is
used?
Cheers,
Jian
On 8/26/05, Marvin Humphrey <[EMAIL PROTECTED]> wrote:
>
> Greets,
>
> [crossposted to java-user@lucene.apache.org and [E
Greets,
[crossposted to java-user@lucene.apache.org and [EMAIL PROTECTED]
I've delved into the matter of Lucene and UTF-8 a little further, and
I am discouraged by what I believe I've uncovered.
Lucene should not be advertising that it uses "standard UTF-8" -- or
even UTF-8 at all, since "