Re: Unicode Technical Report #22

Mark Davis Thu, 20 Mar 2003 11:57:59 -0800


The only problem would come in would be if you were trying to read a CharML
file that *itself* was encoded using a character set that your XML parser
didn't know. That's one reason for encoding the CharML files themselves
always in UTF-8 or ASCII. I'll post this to a broader mailing list, since
some others may have similar concerns.

Mark
___
[EMAIL PROTECTED]
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799



                                                                                       
                                     
                      "Claude Tardif"                                                  
                                     
                      <[EMAIL PROTECTED]>        To:       Mark Davis/Cupertino/[EMAIL 
PROTECTED]                                     
                                               cc:       <[EMAIL PROTECTED]>           
                                      
                      2003.03.19 21:44         Subject:  Unicode Technical Report #22  
                                     
                                                                                       
                                     
                                                                                       
                                     




Your document referenced in the title of this message specifies an XML
format for the interchange of mapping data for character encodings.
Inversely, the Extensible Markup Language (XML) 1.0 (Second Edition)
section 4.3.3 specifies an entity for changing the character encoding of
XML formatted documents. If character encoding uses XML and XML uses
character encoding, there is necessarily an interdependency loop. For
example, what if a conversion library such as ICU parsed character
encoding files using an XML parser which itself used ICU to convert
character encoding in entities? Then, if the XML file defining the
charset encoding for ISO-8859-1 contained the entity <?xml
encoding='ISO-8859-1'?>, this would cause a loop as the character
encoding could never parse itself.

My question is: Is there a way for a conversion library and XML parser
to make use of their services mutually without causing such an
interdependency loop and, preferably, without having such requirements
as character encoding files not containing character encoding in
entities?

Marc Tardif
Re: Unicode Technical Report #22

Reply via email to