RE: Unicode character is not recognized

Keary, Simon Fri, 05 Sep 2003 05:10:34 -0700


Hi Vikas,


The byte value 0x1d isn't a valid UTF-8 character.  It's important to remmber that 
UTF-8 is an encoding scheme - it isn't just a defined mapping from 8bit values to 
characters (like ascii is a mapping from 7bit values to characters).  In very basic 
terms simple Latin characters such as 'a', 'b' etc can be represented by one single 
byte value.  However, more unsual characters such as characters with accents etc are 
encoded as a multi-byte sequence.  For this to work, specific bit patterns are used to 
indicate whethere a character is a single byte or multiple bytes, and in this scheme 
certain bit patterns are invalid.

The following link should explain it in a bit more detail - I'm definitely not an 
expert on this.
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8


Simon


> -----Original Message-----
> From: Agrawal, Vikas (ELS) [mailto:[EMAIL PROTECTED]
> Sent: 05 September 2003 11:47
> To: '[EMAIL PROTECTED]'
> Subject: Unicode character is not recognized
> 
> 
> I am trying to use one of my xml file with DOMCount sample 
> program and it
> crashes with the message 
> "Fatal Error at file
> "C:\contrast\XML4C-~1\Build\Win64\VC6\Release\pdfxml6.xml",
> line 91, column 932
>    Message: Invalid character (Unicode: 0x1D)"
> 
> Could anybody help please? I am attaching xml file with this e-mail.
> 
> Thanks & Regards
> Vikas
>  <<pdfxml6.xml>> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Unicode character is not recognized

Reply via email to