[ http://issues.apache.org/jira/browse/XERCESJ-1041?page=history ]
     
Michael Glavassevich resolved XERCESJ-1041:
-------------------------------------------

    Resolution: Won't Fix

The encoding names which Xerces-J recognizes is restricted to those registered 
with IANA [1]. 

Name: ISO_8859-1:1987                                    [RFC1345,KXS2]
MIBenum: 4
Source: ECMA registry
Alias: iso-ir-100
Alias: ISO_8859-1
Alias: ISO-8859-1 (preferred MIME name)
Alias: latin1
Alias: l1
Alias: IBM819
Alias: CP819
Alias: csISOLatin1

Above are the aliases registered for ISO-8859-1. Xerces-J recognizes all of 
them. Note that ISO8859-1 is not in this list. I believe the XML spec 
recommends the usage of IANA names to increase the portability of XML documents 
across parser implementations. Supporting unregistered encoding names harms 
document portability. The problem you've run into demonstrates that. There are 
many other parsers out there which won't have any idea what encoding 
"ISO8859-1" is since it isn't registered so you still have an interoperability 
problem.

[1] http://www.iana.org/assignments/character-sets

> Xerces C++ defines an encoding-string that Xerces/Java refuses to parse
> -----------------------------------------------------------------------
>
>          Key: XERCESJ-1041
>          URL: http://issues.apache.org/jira/browse/XERCESJ-1041
>      Project: Xerces2-J
>         Type: Bug
>     Versions: 2.4.0
>  Environment: XercesC-2.3, XalanJ 2.4, Solaris 6
>     Reporter: Dominik Stadler

>
> We are using Xerces C++ to create XML-Messages that are later parsed by 
> Xerces/Java.
> XercesC provides a define XMLUni::fgISO88591EncodingString for setting the 
> encoding, the XML-Message contains the string "ISO8859-1" as encoding.
> When we later use Xerces/Java to parse this file, we get the following error:
> [Fatal Error] :1:43: Invalid encoding name "ISO8859-1".
> It seems that Xerces/Java only knows the encoding "ISO-8859-1" (with a dash), 
> but not "ISO8859-1" (without dash).
> The XML-Specification states that "ISO-8859-1" (with a dash) SHOULD be used, 
> look at http://www.w3.org/TR/2004/REC-xml-20040204/#charencoding
> So in my opinion either Xerces C++ should not provide that define any more, 
> or Xerces/Java should be enhanced to accept that encoding-string. Otherwise 
> XercesC and XercesJ differ in this part, where we until now thought they 
> would be equal in their parsing-behavior.
> I already report a Bug at http://issues.apache.org/jira/browse/XERCESC-1336 
> that reports this for XercesC.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to