https://bugzilla.wikimedia.org/show_bug.cgi?id=22137
--- Comment #6 from Platonides <platoni...@gmail.com> 2010-02-12 22:58:48 UTC --- Java internally uses UTF-16 "The native coded character set of the Java programming language is that of the first seventeen planes of the Unicode version 3.0 character set; that is, it consists in the basic multilingual plane (BMP) of Unicode version 1 plus the next sixteen planes of Unicode version 3. This is because the language's internal representation of characters uses the UTF-16 encoding, which encodes the BMP directly and uses surrogate pairs, a simple escape mechanism, to encode the other planes. Hence a charset in the Java platform defines a mapping between sequences of sixteen-bit values in UTF-16 and sequences of bytes." http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html The file contains U+01D59F in UTF-8, thus F0 9D 96 9F. In binary 11110000 10011101 10010110 10011111 I don't see why it is reading a U+26 (100110). PS: Maybe bugzilla is using mysql as utf-8 instead of binary? mysql unicode currently only supports the BMP. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l