El jue, 08-05-2008 a las 12:06 -0700, Kevin Brown escribió: > I'm pretty sure the majority of cases where getBytes is used without > specifying an encoding, it's in test code. Most everywhere else has gone out > of the way to litter the useless UnsupportedEncodingExceptions all over the > place. >
FYI: I had an "encoding" branch in my git repo that seems to be alive, as rebasing it didn't actually break it: I think similar issues to what Henning is reporting got fixed there: http://people.apache.org/~sgala/git/?p=shindig.git;a=shortlog;h=refs/heads/encoding (first three commits) Regards Santiago > On Thu, May 8, 2008 at 11:58 AM, Paul Lindner <[EMAIL PROTECTED]> wrote: > > > I agree with this String.getBytes() is evil, especially for performance > > reasons. See my post about it here: > > > > > > http://paul.vox.com/library/post/the-mysteries-of-java-character-set-perform > > ance.html<http://paul.vox.com/library/post/the-mysteries-of-java-character-set-performance.html> > > > > Here's some code chunks that use Java NIO to convert between character sets > > that don't exhibit the performance problems with looking up character sets > > all the time... > > > > > > private static final Charset UTF8 = Charset.forName("UTF-8"); > > > > try { > > CharsetEncoder toUTF8Bytes = UTF8.newEncoder() > > .onMalformedInput(CodingErrorAction.REPORT) > > .onUnmappableCharacter(CodingErrorAction.REPORT); > > > > return toUTF8Bytes.encode(CharBuffer.wrap(str)) > > } catch (Exception ex) { > > // do something else > > } > > > > String s = UTF8.decode(ByteBuffer.wrap(output)).toString(); > > > > > > > > On 5/8/08 8:44 AM, "Henning P. Schmiedehausen" <[EMAIL PROTECTED]> wrote: > > > > > Having been burned far too many times by far too many Java > > > applications that assume platform encoding == UTF-8 and running > > > applications between different platforms, the free usage of the > > > getBytes() method inside the Shindig code base concerns me a lot. > > > > > > A simple example: Apply the following patch: > > > > > > Index: java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java > > > =================================================================== > > > --- java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java > > > (revision 654541) > > > +++ java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java > > > (working copy) > > > @@ -37,14 +37,17 @@ > > > crypter = new BasicBlobCrypter("0123456789abcdef".getBytes()); > > > crypter.timeSource = new FakeTimeSource(); > > > } > > > + > > > + > > > + > > > > > > @Test > > > public void testHmacSha1() throws Exception { > > > String key = "abcd1234"; > > > - String val = "your mother is a hedgehog"; > > > + String val = "your mother is a hedgehog > > > (\u00e4\u00f6\u00fc\u00c4\u00d6\u00dc)"; > > > byte[] expected = new byte[] { > > > - -21, 2, 47, -101, 9, -40, 18, 43, 76, 117, > > > - -51, 115, -122, -91, 39, 26, -18, 122, 30, 90, > > > + -45, -20, 16, -21, -64, 8, 79, -41, -28, -101, > > > + -108, 73, -113, 79, 57, 40, 107, -1, 107, -61, > > > }; > > > byte[] hmac = Crypto.hmacSha1(key.getBytes(), val.getBytes()); > > > assertArrayEquals(expected, hmac); > > > @@ -53,10 +56,10 @@ > > > @Test > > > public void testHmacSha1Verify() throws Exception { > > > String key = "abcd1234"; > > > - String val = "your mother is a hedgehog"; > > > + String val = "your mother is a hedgehog > > > (\u00e4\u00f6\u00fc\u00c4\u00d6\u00dc)"; > > > byte[] expected = new byte[] { > > > - -21, 2, 47, -101, 9, -40, 18, 43, 76, 117, > > > - -51, 115, -122, -91, 39, 26, -18, 122, 30, 90, > > > + -45, -20, 16, -21, -64, 8, 79, -41, -28, -101, > > > + -108, 73, -113, 79, 57, 40, 107, -1, 107, -61, > > > }; > > > Crypto.hmacSha1Verify(key.getBytes(), val.getBytes(), expected); > > > } > > > @@ -65,10 +68,10 @@ > > > @Test > > > public void testHmacSha1VerifyTampered() throws Exception { > > > String key = "abcd1234"; > > > - String val = "your mother is a hedgehog"; > > > + String val = "your mother is a hedgehog > > > (\u00e4\u00f6\u00fc\u00c4\u00d6\u00dc)"; > > > byte[] expected = new byte[] { > > > - -21, 2, 47, -101, 9, -40, 18, 43, 76, 117, > > > - -51, 115, -122, -91, 39, 0, -18, 122, 30, 90, > > > + -45, -20, 16, -21, -64, 15, 79, -41, -28, -101, > > > + -108, 73, -113, 79, 57, 40, 107, -1, 107, -61, > > > }; > > > try { > > > Crypto.hmacSha1Verify(key.getBytes(), val.getBytes(), expected); > > > > > > > > > now run > > > > > > export LANG=en_US.UTF-8 ; mvn clean ; mvn > > > > > > and > > > > > > export LANG=en_US.ISO-8859-1 ; mvn clean ; mvn > > > > > > The second one then gives me errors in CryptoTest, which means that > > > the actual bytes depend on the platform encoding. Which is bad if you > > > happen to live outside US-ASCII. :-) > > > > > > I have a largeish patch to make sure that everywhere where getBytes() > > > is used actually getBytes("UTF-8") is used (the only place where this > > > is ok is the BasicBlobCrypter, there is only a single bug in > > > there... :-) ), however this needs to deal with useless > > > UnsupportedEncodingException (but be honest: If getBytes("UTF-8") > > > fails, then this is the smallest of your problems. ;-) ). > > > > > > there is a good article from Joel on Software that deals in depth with > > > the whole encoding shebang. Very readable: > > > > > > http://www.joelonsoftware.com/articles/Unicode.html > > > > > > You can also apply this patch: > > > > > > Index: java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java > > > =================================================================== > > > --- java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java > > > (revision 654541) > > > +++ java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java > > > (working copy) > > > @@ -39,6 +39,17 @@ > > > } > > > > > > @Test > > > + public void testCharsetEncoding() throws Exception { > > > + String str = "\u00e4\u00f6\u00fc\u00c4\u00d6\u00dc"; > > > + > > > + assertEquals(12, str.getBytes("UTF-8").length); > > > + assertEquals(6, str.getBytes("ISO-8859-1").length); > > > + > > > + assertEquals(12, str.getBytes().length); > > > + } > > > + > > > + > > > + @Test > > > public void testHmacSha1() throws Exception { > > > String key = "abcd1234"; > > > String val = "your mother is a hedgehog"; > > > > > > and run with ISO-8859-1 and UTF-8 platform encodings to illustrate the > > > problem. > > > > > > Best regards > > > Henning (living in ÀöÃπ country. ;-) ) > > > > -- Santiago Gala http://memojo.com/~sgala/blog/