abhishek

When you save a file to disk, you can save it in different encoding formats.
Either 8bit 16bit or
even 32bit formats. It determines how many bits of memory are utilized for
each character in your document.

The encoding attribute is telling the XML Processor what encoding we should
be using. This is an optional attribute.  If it is present, the XMLProcessor
uses this as the encoding. If it is not present, the XMLProcessor should be
able to automatically determine which encoding is used by looking at the
first 4 characters "<?xm". 

Also possibly present is a BOM (byte Order Mark) at the start of the
document.  This is present with certain encodings and should be picked up
with the Xerces parser.

Remember the following rules from the XML Spec

- If Encoding Declaration is present, it must be of the same Bit pattern as
that found from AutoDetection
- UTF-16 Encoding requires use of UFT-16 BOM
- If using UTF-16BE or UTF-16LE (BigEndian /LittleEndian), you cannot use a
BOM and they need to be declared in the encoding declaration
- If no encoding declaration is present in XML Document, the only 2 valid
values for encoding is UTF-8 and UTF-16 
- If BOM is NOT present and declaration is NOT present, encoding must be
UTF-8


A good small editor I came across for saving a file in Most formats is
called UnicEdit. Check it out.

Being Honest, I am not sure if Xerces implements all of the above rules (but
it Should). 

I can imagine that the support for the encodings relys on the version of
your JDK and also what your platform supports.

Check out http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html for
JDK1.3 supported encodings.

Also check out http://www.iana.org/assignments/character-sets for correct
names of encodings to use.


hope this helps

Cheers

Don



-----Original Message-----
From: abhishekhp [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 12, 2001 9:12 PM
To: [EMAIL PROTECTED]
Subject: Rookie Question on encoding


Hi,
A rookie question regarding encoding....
Seen a lot of posts regarding UTF-8/16 encoding. Could someone briefly
explain what exactly is meant by the encoding attribute.
Meaning when we say
<?xml version="1.0" encoding="UTF-8"?>

What does the encoding attribute denote, and what is its relavence to
the xerces parser?
Are there any popular encoding formats that xerces does not support?

TIA,
abhishek.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to