Andy Clark wrote: > Elliotte Rusty Harold wrote: > >> the only thing that could possibly make XMLPULL API not > >> 100% compatible with XML 1.0 is when PROCESS DOCDECL feature > >> [...] > > > > That's a very big one. A parser should not be allowed to turn off > > processing of the internal DTD subset at all. And to make not processing > > it the default?! That's just wrong. > > Well, you gotta look at the intended purpose of these types > of parsers. If I remember correctly, the XPP work was started > because of SOAP which subsets XML syntax and doesn't allow a > DOCTYPE line at all.
that means that those implementations were designed to concentrate on size (like kXML2) or speed (like MXP1) but there can be many other implementations ... > > Worse yet, according to http://www.xmlpull.org/impls.shtml neither of > > the existing implementations even allows you to set that feature to true. > > I think Alexsander has code to use Xerces2 as the driver > for the push API. So, if used that way then it should be > able to check the DTD just like Xerces. And when I finish > my API for the CyberNeko tools, the default impl will be > driven by Xerces so it should have no problem in that > regard. exactly - one thing is API and completely another is implementation. as long as each implementation is correctly described users can make informed choices. > > I've also heard it claimed recently that the parsers aren't doing all > > the name character checking they're supposed to, though I haven't > > No wonder they're so fast. ;) This is one of the big > checks that implementors would love to remove from their > inner loop. Xerces, being fully compliant, can't do that > and suffers some performance hits. in MXP1 i use lookup table for char values below and if statement for the rest. i am putting relevant part of code from MXP1 below and welcome comments about it (especially if you find anything wrong with the functions!). > Just about any XML parser can be written to go fast if > they don't do all of the work. For example, removing > character checking, avoiding DTD parsing and processing, > not implementing XML Schema, etc. But, depending on the > situation, these are all perfectly acceptable choices. well - i think that MXP1 do all XML parsing and i am slowly improving it to the level of non validating parsing - only remaining incompatibilities i know about is DTD parsing and add XML 1.0 character set support (i am a bit hesitant about it as i like XML 1.1 much more ...). thanks, alek ps. here is fragment of MXParser - please comment if you think that i am missing something important when looking on what is required in http://www.w3.org/TR/xml11/#sec2.3 (thanks in advance!) protected static final int LOOKUP_MAX = 0x400; protected static final char LOOKUP_MAX_CHAR = (char)LOOKUP_MAX; protected static boolean lookupNameStartChar[] = new boolean[ LOOKUP_MAX ]; protected static boolean lookupNameChar[] = new boolean[ LOOKUP_MAX ]; private static final void setName(char ch) { lookupNameChar[ ch ] = true; } private static final void setNameStart(char ch) { lookupNameStartChar[ ch ] = true; setName(ch); } static { setNameStart(':'); for (char ch = 'A'; ch <= 'Z'; ++ch) setNameStart(ch); setNameStart('_'); for (char ch = 'a'; ch <= 'z'; ++ch) setNameStart(ch); for (char ch = '\u00c0'; ch <= '\u02FF'; ++ch) setNameStart(ch); for (char ch = '\u0370'; ch <= '\u037d'; ++ch) setNameStart(ch); for (char ch = '\u037f'; ch < '\u0400'; ++ch) setNameStart(ch); setName('-'); setName('.'); for (char ch = '0'; ch <= '9'; ++ch) setName(ch); setName('\u00b7'); for (char ch = '\u0300'; ch <= '\u036f'; ++ch) setName(ch); } private final static boolean isNameStartChar(char ch) { return (ch < LOOKUP_MAX_CHAR && lookupNameStartChar[ ch ]) || (ch >= LOOKUP_MAX_CHAR && ch <= '\u2027') || (ch >= '\u202A' && ch <= '\u218F') || (ch >= '\u2800' && ch <= '\uFFEF') ; } private final static boolean isNameChar(char ch) { return (ch < LOOKUP_MAX_CHAR && lookupNameChar[ ch ]) || (ch >= LOOKUP_MAX_CHAR && ch <= '\u2027') || (ch >= '\u202A' && ch <= '\u218F') || (ch >= '\u2800' && ch <= '\uFFEF') ; } protected boolean isS(char ch) { return (ch == ' ' || ch == '\n' || ch == '\r' || ch == '\t'); } --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
