-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Murray Altheim
Sent: Wednesday, January 16, 2002 3:37 AM
To: [EMAIL PROTECTED]
Subject: Re: Future of Xindice
In response to Mike Mortensen and Timothy M. Dean, I'll try to reiterate
that I'm not against validation features being available in applications
that use Xindice as a data store. I'm against those features being
embedded in Xindice itself. I guess where there seems to be some confusion
is what "embedded in Xindice" means exactly.
First, lets differentiate structural vs. content validation.
Structural Validation: validates that the XML markup of a document is
(a) well-formed, and optionally (if a schema is available and the
parser is set in validation mode), that
(b) the markup structure is valid according to the schema constraints.
Content Validation: validates that the element and attribute content
conforms to the constraints expressed in a schema (or perhaps written
into the application itself).
I certainly understand and agree with your points, and myself have experience
similar to yours. Content validation is appropriate and often critical to
database applications. Structural validation is a rather unique need in
XML databases, since a relational or object database can't be corrupted by
incoming data (unless there's something strange in the database design).
My point (which I'm guessing was not expressed very clearly) is that any
Xindice-based application *must* have an XML parser available, and Xindice
is distributed with Xerces 2, which provides support for DTDs and XML
Schema. If you need stronger content validation, Xerces provides that with
its XML Schema support.
At most stages in the process an XML processor is *required* to handle the
XML content moving in and out of Xindice. All one needs to do to provide
stronger validation support during these processes is to establish those
parsers in validation mode, and provide the schemas necessary to validate
the content. As I mentioned in my previous message, you can even "pipeline
validate", which is to validate the content travelling between components
by validating the SAX events themselves. O'Reilly will soon be publishing
a SAX book by David Brownell (initial author of Sun's XML parser) that
describes this type of functionality.
You don't need to include validation features in Xindice itself because
the packages required to support Xindice already provide those features,
and any application built upon Xindice *by necessity* must parse and
process XML content. All XML content going into Xindice must at minimum
be well-formed XML -- that's structural validation at its most basic. If
further structural or content validation is needed, set the parser
factories to produce validating parsers, and then provide the schemas.
To put these features into Xindice itself would be redundant and
unnecessary. Xerces is already doing it.
I now more clearly understand where you're going. However, I still have
problems with the central assertion.
Before addressing the central issue, I wanted to clear up another point. It is
possible to break the referential integrity in a relational database (even with
only in-bound data). Take the example of two tables Department and Employee
Department
=========
ID
Name
Employee
=========
ID
DepartmentID
FirstName
LastName
Let's sprinkle a little data into our structure for better illustration.
Department
3514 Finance
3515 Research
Employee
19845 3515 Lorraine Jacobson
19846 3514 Stephen Reed
19847 3515 Alexander Morris
19848 3515 Phillip Gutierrez
Without the appropriate foreign key constraints, it would be possible to insert
new records into the Employee table with an invalid reference to the Department
table (as shown here with the new employee Anton Azzameen
Employee
19845 3515 Lorraine Jacobson
19846 3514 Stephen Reed
19847 3515 Alexander Morris
19848 3515 Phillip Gutierrez
19849 NULL Anton Azzameen
Now let's see this same example in XML
<organization>
<employee>
<department>Research</department>
<firstName>Lorraine</firstName>
<lastName>Jacobson</lastName>
</employee>
<employee>
<department> Finance </department>
<firstName> Stephen </firstName>
<lastName> Reed </lastName>
</employee>
<employee>
<department>Research</department>
<firstName> Alexander </firstName>
<lastName> Morris </lastName>
</employee>
<employee>
<department>Research</department>
<firstName> Phillip </firstName>
<lastName> Gutierrez </lastName>
</employee>
<employee>
<firstName> Anton </firstName>
<lastName>Azzameen </lastName>
</employee>
</organization>
The XML version of the example is well-formed but would be invalid (where the
department is a required element of employee). The relational version is
likewise permissible (except where there exists a foreign key constraint).
Consequently, it should be more easily seen that structural validation is
<emphasis>not</emphasis> unique to the XML world and is directly applicable
elsewhere (i.e. the relational model). However, all of this is tangential to
the central issue of where validation should take place.
If I have correctly understood your argument, you saying that because the XML
content must pass through an XML parser (which can be set to validate as well
as parse), there is no need to embed this functionality in Xindice. Basically,
since XML parsing (and optionally validating) is required and is presently
available outside of Xindice, why go to the effort to embed it within Xindice?
The answer is the same reason as before. If Xindice is to become a "dumb"
datastore (meaning that it relies on outside applications for validation of the
data it stores), then what is there to prevent an application (which shares
Xindice with other applications) from choosing a different parser (which varies
in its ability to validate from that of Xerces used by the other applications)
or even from choosing not to validate at all.
Again, it really boils down to whether or not all applications using Xindice
can rely on the datastore to contain only valid data.
If there is an advantage to keeping the validating parser outside of Xindice,
then fine. However, once the parser and the validation mechanism is chosen for
a collection, the use of the selected validating parser and mechanism must be
bound to the collection so that Xindice does not break its contract with the
applications that use it as the data store. This tight-coupling of parser and
mechanism (DTD, W3C Schema, Relax, etc.) certainly seems to imply that it
should be embedded within Xindice.
MGM