DO NOT REPLY [Bug 6938] New: - A UTF-8 encoded file with a UTF-8 Byte-Order-Marks cannot be parsed

bugzilla Wed, 06 Mar 2002 14:48:25 -0800

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6938>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6938

A UTF-8 encoded file with a UTF-8 Byte-Order-Marks cannot be parsed

           Summary: A UTF-8 encoded file with a UTF-8 Byte-Order-Marks
                    cannot be parsed
           Product: Xerces-J
           Version: 1.4.4
          Platform: All
        OS/Version: Windows NT/2K
            Status: NEW
          Severity: Major
          Priority: Other
         Component: Other
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


A valid UTF-8 encoded file .. with a valid UTF-8 Byte-Order-mark cannot be 
parsed.

look in file org.apache.xerces.readers.UTF8Recognizer ..

at line 89 the code does look for a valid bom at the start of the file and at 
line 100 it does attempt to skip the bytes ..

however at line 105 the code uses class ChunkyByteArray method byteAt which 
does NOT appear to be sensitive to the fact that the first 3 bytes were 'read' 
(it does not use the fOffset field).

An UnsupportedEncodingException is eventually thrown.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 6938] New: - A UTF-8 encoded file with a UTF-8 Byte-Order-Marks cannot be parsed

Reply via email to