[ http://issues.apache.org/jira/browse/XERCESJ-1041?page=history ]
Michael Glavassevich resolved XERCESJ-1041:
-------------------------------------------
Resolution: Won't Fix
The encoding names which Xerces-J recognizes is restricted to those registered
with IANA [1].
Name: ISO_8859-1:1987 [RFC1345,KXS2]
MIBenum: 4
Source: ECMA registry
Alias: iso-ir-100
Alias: ISO_8859-1
Alias: ISO-8859-1 (preferred MIME name)
Alias: latin1
Alias: l1
Alias: IBM819
Alias: CP819
Alias: csISOLatin1
Above are the aliases registered for ISO-8859-1. Xerces-J recognizes all of
them. Note that ISO8859-1 is not in this list. I believe the XML spec
recommends the usage of IANA names to increase the portability of XML documents
across parser implementations. Supporting unregistered encoding names harms
document portability. The problem you've run into demonstrates that. There are
many other parsers out there which won't have any idea what encoding
"ISO8859-1" is since it isn't registered so you still have an interoperability
problem.
[1] http://www.iana.org/assignments/character-sets
> Xerces C++ defines an encoding-string that Xerces/Java refuses to parse
> -----------------------------------------------------------------------
>
> Key: XERCESJ-1041
> URL: http://issues.apache.org/jira/browse/XERCESJ-1041
> Project: Xerces2-J
> Type: Bug
> Versions: 2.4.0
> Environment: XercesC-2.3, XalanJ 2.4, Solaris 6
> Reporter: Dominik Stadler
>
> We are using Xerces C++ to create XML-Messages that are later parsed by
> Xerces/Java.
> XercesC provides a define XMLUni::fgISO88591EncodingString for setting the
> encoding, the XML-Message contains the string "ISO8859-1" as encoding.
> When we later use Xerces/Java to parse this file, we get the following error:
> [Fatal Error] :1:43: Invalid encoding name "ISO8859-1".
> It seems that Xerces/Java only knows the encoding "ISO-8859-1" (with a dash),
> but not "ISO8859-1" (without dash).
> The XML-Specification states that "ISO-8859-1" (with a dash) SHOULD be used,
> look at http://www.w3.org/TR/2004/REC-xml-20040204/#charencoding
> So in my opinion either Xerces C++ should not provide that define any more,
> or Xerces/Java should be enhanced to accept that encoding-string. Otherwise
> XercesC and XercesJ differ in this part, where we until now thought they
> would be equal in their parsing-behavior.
> I already report a Bug at http://issues.apache.org/jira/browse/XERCESC-1336
> that reports this for XercesC.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]