El jue, 08-05-2008 a las 12:06 -0700, Kevin Brown escribió:
> I'm pretty sure the majority of cases where getBytes is used without
> specifying an encoding, it's in test code. Most everywhere else has gone out
> of the way to litter the useless UnsupportedEncodingExceptions all over the
> place.
> 

FYI: I had an "encoding" branch in my git repo that seems to be alive,
as rebasing it didn't actually break it:

I think similar issues to what Henning is reporting got fixed there:

http://people.apache.org/~sgala/git/?p=shindig.git;a=shortlog;h=refs/heads/encoding

(first three commits)


Regards
Santiago

> On Thu, May 8, 2008 at 11:58 AM, Paul Lindner <[EMAIL PROTECTED]> wrote:
> 
> > I agree with this String.getBytes() is evil, especially for performance
> > reasons.  See my post about it here:
> >
> >
> > http://paul.vox.com/library/post/the-mysteries-of-java-character-set-perform
> > ance.html<http://paul.vox.com/library/post/the-mysteries-of-java-character-set-performance.html>
> >
> > Here's some code chunks that use Java NIO to convert between character sets
> > that don't exhibit the performance problems with looking up character sets
> > all the time...
> >
> >
> >    private static final Charset UTF8 = Charset.forName("UTF-8");
> >
> >    try {
> >        CharsetEncoder toUTF8Bytes = UTF8.newEncoder()
> >                     .onMalformedInput(CodingErrorAction.REPORT)
> >                     .onUnmappableCharacter(CodingErrorAction.REPORT);
> >
> >       return toUTF8Bytes.encode(CharBuffer.wrap(str))
> >    } catch (Exception ex) {
> >        // do something else
> >    }
> >
> >    String s = UTF8.decode(ByteBuffer.wrap(output)).toString();
> >
> >
> >
> > On 5/8/08 8:44 AM, "Henning P. Schmiedehausen" <[EMAIL PROTECTED]> wrote:
> >
> > > Having been burned far too many times by far too many Java
> > > applications that assume platform encoding == UTF-8 and running
> > > applications between different platforms, the free usage of the
> > > getBytes() method inside the Shindig code base concerns me a lot.
> > >
> > > A simple example: Apply the following patch:
> > >
> > > Index: java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java
> > > ===================================================================
> > > --- java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java
> > > (revision 654541)
> > > +++ java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java
> > > (working copy)
> > > @@ -37,14 +37,17 @@
> > >      crypter = new BasicBlobCrypter("0123456789abcdef".getBytes());
> > >      crypter.timeSource = new FakeTimeSource();
> > >    }
> > > +
> > > +
> > > +
> > >
> > >    @Test
> > >    public void testHmacSha1() throws Exception {
> > >      String key = "abcd1234";
> > > -    String val = "your mother is a hedgehog";
> > > +    String val = "your mother is a hedgehog
> > > (\u00e4\u00f6\u00fc\u00c4\u00d6\u00dc)";
> > >      byte[] expected = new byte[] {
> > > -        -21, 2, 47, -101, 9, -40, 18, 43, 76, 117,
> > > -        -51, 115, -122, -91, 39, 26, -18, 122, 30, 90,
> > > +        -45, -20, 16, -21, -64, 8, 79, -41, -28, -101,
> > > +        -108, 73, -113, 79, 57, 40, 107, -1, 107, -61,
> > >      };
> > >      byte[] hmac = Crypto.hmacSha1(key.getBytes(), val.getBytes());
> > >      assertArrayEquals(expected, hmac);
> > > @@ -53,10 +56,10 @@
> > >    @Test
> > >    public void testHmacSha1Verify() throws Exception {
> > >      String key = "abcd1234";
> > > -    String val = "your mother is a hedgehog";
> > > +    String val = "your mother is a hedgehog
> > > (\u00e4\u00f6\u00fc\u00c4\u00d6\u00dc)";
> > >      byte[] expected = new byte[] {
> > > -        -21, 2, 47, -101, 9, -40, 18, 43, 76, 117,
> > > -        -51, 115, -122, -91, 39, 26, -18, 122, 30, 90,
> > > +        -45, -20, 16, -21, -64, 8, 79, -41, -28, -101,
> > > +        -108, 73, -113, 79, 57, 40, 107, -1, 107, -61,
> > >      };
> > >      Crypto.hmacSha1Verify(key.getBytes(), val.getBytes(), expected);
> > >    }
> > > @@ -65,10 +68,10 @@
> > >    @Test
> > >    public void testHmacSha1VerifyTampered() throws Exception {
> > >      String key = "abcd1234";
> > > -    String val = "your mother is a hedgehog";
> > > +    String val = "your mother is a hedgehog
> > > (\u00e4\u00f6\u00fc\u00c4\u00d6\u00dc)";
> > >      byte[] expected = new byte[] {
> > > -        -21, 2, 47, -101, 9, -40, 18, 43, 76, 117,
> > > -        -51, 115, -122, -91, 39, 0, -18, 122, 30, 90,
> > > +        -45, -20, 16, -21, -64, 15, 79, -41, -28, -101,
> > > +        -108, 73, -113, 79, 57, 40, 107, -1, 107, -61,
> > >      };
> > >      try {
> > >        Crypto.hmacSha1Verify(key.getBytes(), val.getBytes(), expected);
> > >
> > >
> > > now run
> > >
> > > export LANG=en_US.UTF-8 ; mvn clean ; mvn
> > >
> > > and
> > >
> > > export LANG=en_US.ISO-8859-1 ; mvn clean ; mvn
> > >
> > > The second one then gives me errors in CryptoTest, which means that
> > > the actual bytes depend on the platform encoding. Which is bad if you
> > > happen to live outside US-ASCII. :-)
> > >
> > > I have a largeish patch to make sure that everywhere where getBytes()
> > > is used actually getBytes("UTF-8") is used (the only place where this
> > > is ok is the BasicBlobCrypter, there is only a single bug in
> > > there... :-) ), however this needs to deal with useless
> > > UnsupportedEncodingException (but be honest: If getBytes("UTF-8")
> > > fails, then this is the smallest of your problems. ;-) ).
> > >
> > > there is a good article from Joel on Software that deals in depth with
> > > the whole encoding shebang. Very readable:
> > >
> > > http://www.joelonsoftware.com/articles/Unicode.html
> > >
> > > You can also apply this patch:
> > >
> > > Index: java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java
> > > ===================================================================
> > > --- java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java
> > > (revision 654541)
> > > +++ java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java
> > > (working copy)
> > > @@ -39,6 +39,17 @@
> > >    }
> > >
> > >    @Test
> > > +  public void testCharsetEncoding() throws Exception {
> > > +      String str = "\u00e4\u00f6\u00fc\u00c4\u00d6\u00dc";
> > > +
> > > +      assertEquals(12, str.getBytes("UTF-8").length);
> > > +      assertEquals(6, str.getBytes("ISO-8859-1").length);
> > > +
> > > +      assertEquals(12, str.getBytes().length);
> > > +  }
> > > +
> > > +
> > > +  @Test
> > >    public void testHmacSha1() throws Exception {
> > >      String key = "abcd1234";
> > >      String val = "your mother is a hedgehog";
> > >
> > > and run with ISO-8859-1 and UTF-8 platform encodings to illustrate the
> > > problem.
> > >
> > > Best regards
> > >     Henning (living in ÀöÃπ country. ;-) )
> >
> >
-- 
Santiago Gala
http://memojo.com/~sgala/blog/

Reply via email to