Hi Sasa, The parser is working as expected. A bug fix in the ASCIIReader now rejects any bytes that are not valid US-ASCII. This is in accordance with the XML rec (http://www.w3.org/TR/REC-xml#charencoding): "It is a fatal error if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains octet sequences that are not legal in that encoding."
Since your document is labeled US-ASCII and contains non ASCII bytes, the document isn't well formed. The ASCII range is Unicode 0-127. Any bytes outside that range are not members of US-ASCII. The character you intended to include in your document is part of Latin-1. If you want to correct your document, one way is to change the encoding attribute to ISO-8859-1. On Wed, 20 Aug 2003, Sasa Bojanic wrote: > Hi, > > I think that that there is an encoding related bug in Xerces2.5. > When using DOM parser, and trying to parse a document that contains > characters that do not belong to the character set that correspond to the > specified document encoding (e.g. the character ä is contained in the > document which encoding is specified as "us-ascii"), the parser is crashing. > > Here is the code snippet: > > try { > DOMParser parser = new DOMParser(); > parser.parse(toParse); > }catch (Exception ex) { > ex.printStackTrace(); > } > > * "toParse" is the path to the following document: > > <?xml version="1.0" encoding="us-ascii"?> > <Package Id="pkg1"> > <!-- ä --> > <PackageHeader> > <XPDLVersion>1.0</XPDLVersion> > <Vendor>Together</Vendor> > <Created>2003-08-20 10:00:49</Created> > </PackageHeader> > </Package> > > The parser crashes because of ä character, and I get the following stack > trace: > java.io.IOException: Byte "228" is not a member of the (7-bit) ASCII > character set. > at org.apache.xerces.impl.io.ASCIIReader.read(Unknown Source) > at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) > at org.apache.xerces.impl.XML11EntityScanner.skipSpaces(Unknown > Source) > at > org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown > Source) > at > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown > Source) > at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) > at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source) > at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at XML.main(XML.java:25) > > When I use Xerces2.4, everything goes fine! > > Regards, > Sasa. > -- -------------------- Michael Glavassevich [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]