abhishek When you save a file to disk, you can save it in different encoding formats. Either 8bit 16bit or even 32bit formats. It determines how many bits of memory are utilized for each character in your document.
The encoding attribute is telling the XML Processor what encoding we should be using. This is an optional attribute. If it is present, the XMLProcessor uses this as the encoding. If it is not present, the XMLProcessor should be able to automatically determine which encoding is used by looking at the first 4 characters "<?xm". Also possibly present is a BOM (byte Order Mark) at the start of the document. This is present with certain encodings and should be picked up with the Xerces parser. Remember the following rules from the XML Spec - If Encoding Declaration is present, it must be of the same Bit pattern as that found from AutoDetection - UTF-16 Encoding requires use of UFT-16 BOM - If using UTF-16BE or UTF-16LE (BigEndian /LittleEndian), you cannot use a BOM and they need to be declared in the encoding declaration - If no encoding declaration is present in XML Document, the only 2 valid values for encoding is UTF-8 and UTF-16 - If BOM is NOT present and declaration is NOT present, encoding must be UTF-8 A good small editor I came across for saving a file in Most formats is called UnicEdit. Check it out. Being Honest, I am not sure if Xerces implements all of the above rules (but it Should). I can imagine that the support for the encodings relys on the version of your JDK and also what your platform supports. Check out http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html for JDK1.3 supported encodings. Also check out http://www.iana.org/assignments/character-sets for correct names of encodings to use. hope this helps Cheers Don -----Original Message----- From: abhishekhp [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 12, 2001 9:12 PM To: [EMAIL PROTECTED] Subject: Rookie Question on encoding Hi, A rookie question regarding encoding.... Seen a lot of posts regarding UTF-8/16 encoding. Could someone briefly explain what exactly is meant by the encoding attribute. Meaning when we say <?xml version="1.0" encoding="UTF-8"?> What does the encoding attribute denote, and what is its relavence to the xerces parser? Are there any popular encoding formats that xerces does not support? TIA, abhishek. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
