Re: [7] Review request for 6836089: Swing HTML parser can't properly decode codepoints outside the Unicode Plane 0 into a surrogate pair

Vladislav Karnaukhov Thu, 30 Aug 2012 08:22:47 -0700

Hello,

can anyone review this?


- Vlad

On 8/27/2012 6:51 PM, Vladislav Karnaukhov wrote:

Hello,

could you please review a new version?
Please find webrev here:http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.04/
Regards,
- Vlad

On 6/29/2012 2:09 AM, Phil Race wrote:
That would work but I think its cleaner to just move it all intomapNumericReference
as below.

-phil.
diff --gita/src/share/classes/javax/swing/text/html/parser/Parser.javab/src/share/classes/javax/swing/text/html/parser/Parser.java
--- a/src/share/classes/javax/swing/text/html/parser/Parser.java
+++ b/src/share/classes/javax/swing/text/html/parser/Parser.java
@@ -952,7 +952,7 @@
                         ch = readCh();
                         break;
                 }
-                char data[] = {mapNumericReference((char) n)};
+                char data[] = mapNumericReference(n);
                 return data;
             }
             addString('#');
@@ -1021,7 +1021,7 @@
     }

     /**
-     * Converts numeric character reference to Unicode character.
+     * Converts numeric character reference to char array.
      *
      * Normally the code in a reference should be always converted
      * to the Unicode character with the same code, but due to
@@ -1030,13 +1030,21 @@
      * to displayable characters with other codes.
      *
      * @param c the code of numeric character reference.
-     * @return the character corresponding to the reference code.
+     * @return a char array corresponding to the reference code.
      */
-    private char mapNumericReference(char c) {
-        if (c < 130 || c > 159) {
-            return c;
+    private char[] mapNumericReference(int c) {
+        char[] data;
+        if (c >= 0xffff) { // outside unicode BMP.
+            try {
+                data = Character.toChars(c);
+            } catch (IllegalArgumentException e) {
+                data = new char[0];
         }
-        return cp1252Map[c - 130];
+        } else {
+            data = new char[1];
+ data[0] = (c < 130 || c > 159) ? (char)c : cp1252Map[c -130];
+        }
+        return data;
     }


On 06/28/12 07:58 AM, Vladislav Karnaukhov wrote:
Hello Phil, Pavel,
thank you for your comments. I've reworked fix and testcase, pleasefind new webrev here:http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.03/
Regards,
- Vlad

On 6/27/2012 10:13 PM, Phil Race wrote:
Well its not only unnecessary but is likely wrong .. I don't think
you looked at what mapNumericReference() does.

-phil.

On 6/27/2012 11:03 AM, Vladislav Karnaukhov wrote:
Hello Phil,
I used Character.toChars() in both branches because I wanted todelegate code point conversion to char or surrogate pair entirelyto Character class...
Regards,
- Vlad

On 6/26/2012 9:49 PM, Phil Race wrote:
I don't understand why you call Character.toChars() if you'vejust determined
you don't need to ?

ie what was wrong with
data = ( n >>> 16 == 0) ? {mapNumericReference((char) n)} :Character.toChars(n);
?
In the case of an invalid supplementary pair, maybe it would besafer to return { ' ' } ?
One thing I see in the parsing code that is not new or changedhere, thatmay bear examination, is that there's a loop that keeps onreading so longso long there are new digits. I am not sure its wise to keepgoing once
you overflow.

-phil.

On 6/26/2012 12:37 AM, Vladislav Karnaukhov wrote:
Hello Pavel,
I can provide you with the link to 6u19, but this is directforward-port and no code changes were made.
I'll make changes as you've pointed out in 1) and 2)
About 3) - is it a requirement to use "? :" operator? Ipersonally prefer single-line if-else, but I don't want to argueover code style, and surely I'll follow code design practices.
Regards,
- Vlad

On 6/25/2012 6:43 PM, Pavel Porvatov wrote:
Hi Vladislav,

Do you have a link to the fix for 6u19?

I didn't investigate the fix deeply, but

1.
 private final int MAX_BMP_BOUND = 65535;
should be static (otherwise variable name should be in lower case)

2. Add a space in single line comments

3.
+                    char data[];
+                    if (n <= MAX_BMP_BOUND) {
+ data =Character.toChars(mapNumericReference((char) n));
+                    } else {
+                        data = Character.toChars(n);
+                    }
+
                 return data;
can be written in one line via "? :" operator and looks morereadable for me
Thanks, Pavel
Hello,
please review the fix for 6836089: Swing HTML parser can'tproperly decode codepoints outside the Unicode Plane 0 into asurrogate pair. This is a forward port from JDK6 (fixedescalated issue, fix integrated) to JDK7.
The issue is a defect in Swing HTML Parser: if the codepointis outside BMP (Unicode Plain 0), Parser incorrectly decodescodepoint into surrogate pair. The fix is to useCharacter.toChars() method if codepoint value is greater thanupper bound of BMP.
Webrev: http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.00/
Bug description:http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6836089
Regards,
- Vlad

Re: [7] Review request for 6836089: Swing HTML parser can't properly decode codepoints outside the Unicode Plane 0 into a surrogate pair

Reply via email to