Hi folks, For me, one of the neatest things about working on Xerces is the opportunity to learn about the plethora of products for which Xerces is a base technology. Sitting as it does at pretty much the lowest level of XML processing, a Xerces developer gets to find out about the needs of all kinds of different products that need to interact with XML.
As of J2SE 1.4, one type of product that needs to understand XML is a JVM. In fact, since SAX, DOM and JAXP are now core specifications in J2SE/EE, any implementation of these specifications needs to have an XML parser right at its core. And Xerces is--at least for some JDK implementors--the parser of choice! We were already shipped in IBM's JDK 1.3.0; we'll be there in 1.4 as well. And that, in itself, seems to me to be fairly neat. But, with this popularity, goes this multitude of needs I mentioned before. For instance, certain IBM JDK's (and I'm betting IBM won't be the only implementor to offer choices like this) are "reusable". For more information on IBM's version of this kind of JDK, you could look here http://www-1.ibm.com/servers/eserver/zseries/software/java/pdf/jtc0a100.pdf (if you don't mind PDF!) As a brief summary, what this means is that the same JVM can be used by successive applications. Basically, between application sessions, the JVM gets reinitialized or reset. But being reset doesn't affect classes that lie at the heart of this kind of JVM. And XML parsing has moved so far down in the application stack that the XML parser doesn't get reset between application sessions. So, when Xerces is used in this kind of JVM, values that are static in Xerces will carry forward from one application into another. Therefore, if an application is able to modify any of our static values in such a way that xerces's behaviour will be altered, we have a problem--the next application might not work and the JVM is effectively made non-reusable (read broken). Since it's not possible for a user to know how an an application she wants to use works, we need to make sure there aren't any ways for this kind of problem to arise because of some interaction that an application has with Xerces. In practice, it doesn't look like meeting this requirement will cause much disruption. We'll have to make many static variables throughout our code private--or at least package-protected. Sometimes, we'll have to change accessor methods to return clones of static objects instead of the actual objects themselves. But, applications which don't use Xerces internals--and applications shouldn't be using Xerces internals because they could change at any moment for all kinds of other reasons--shouldn't be affected. So I'll be checking in changes over the next couple of weeks to make Xerces "statically immutable" as the terminology goes. As I do this, I'll try to (1) not affect any externally visible, and especially any externally useful behaviour; (2) not impact Xerces's performance; (3) keep Xerces as extensible as possible. Sandy's already been helping me with some of these issues, so we'll have a second mind watching this stuff. Here's a partial list of things that will need to be fixed. If anyone thinks one of these changes might break them, this would be a great time to speak up! CoreDocumentImpl#kidOK: make private final. Many classes: make RECOGNIZED_FEATURES and RECOGNIZED_PROPERTIES private final, and return clones of these objects in the appropriate getter methods. We only use these when building configuration pipelines internally at the moment, so this shouldn't be a measurable performance problem. XSAttributeChecker: make ATTIDX_COUNT private make a few static members private final and rework some others so they're not exposed to the outside world. XMLChar: make CHARS byte array private. Version: Made fVersion final. Base64: made base64Alphabet and lookUpBase64Alphabet final Hexbin: made hexNumberTable and lookUpHexAlphabet final UCSReader: made UCS(2|4)(B|L)E final ExceptionMessages, ImplementationMessages, DatatypeMessages: we can probably live without these files entirely since they aren't ref'd anywhere. XPath$XPathScanner: made fASCIICharMap final XPath$Tokens: made fgTokenNames private instead of public ParserForXMLSchema: changed range and ranges to private REUtil: changes regexCache to be final it is never changed but it never hurts to be sure... RegularExpression: changed declaration of wordchar so that it is local to the method in which it is initialized and used. Token: changed blockNames to private from package-protected changed categories and categories2 from package protected to private final changed categoryNames to private token_0to9: package protected token_not_0to9: package protected token_not_wordchars: package protected token_wordchars: package protected token_not_wordedge: package protected token_wordedge: package protected token_dot: package protected token_not_spaces: package protected token_spaces: package protected token_empty: package protected token_linebeginning: package protected token_linebeginning2: package protected token_wordbeginning: package protected token_string_beginning: package protected token_lineend: package protected token_wordend: package protected token_stringend: package protected token_stringend2: package protected getCombiningCharacterSequence: protected->package protected getGraphemePattern: protected->package protected SchemaSymbols: make fSchemaSymbols in class and in the inner class private. IDValue: made VS private. XSDComplexTypeTraverser: removed fErrorContent; now it's created each time getErrorContent is called, but that's only during error conditions so this loses little. Changed to make work with a modified restricted XSComplexTypeDecl. HTMLSerializer: made _xhtml nonstatic and made XHTMLNamespace final (this should be correctly initialized however). SchemaGrammar: Modify so no static members remain visible. XSComplexTypeDecl: In order that xsd:anyType can be of this type, we'll have to rearrange this a bit. XSSimpleTypeDecl: fix so that objects of this kind know whether they're schema primitive types (therefore are static), and therefore shouldn't be modified. SchemaDVFactory: this is the hardest one; have to make setFactory make sense in the context of static immutability. Hope that makes sense. As always, thoughts much appreciated--especially if they touch on how SchemaDVFactory, SchemaGrammar et al can be modified so as not to hurt performance, retain functionality and yet achieve the objective of making them statically immutable. Cheers! Neil Neil Graham XML Parser Development IBM Toronto Lab Phone: 905-413-3519, T/L 969-3519 E-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
