RE: Future of Xindice

Mike Mortensen 16 Jan 2002 16:13:23 -0000

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Murray Altheim
Sent: Wednesday, January 16, 2002 3:37 AM
To: [EMAIL PROTECTED]
Subject: Re: Future of Xindice


In response to Mike Mortensen and Timothy M. Dean, I'll try to reiterate
that I'm not against validation features being available in applications
that use Xindice as a data store. I'm against those features being
embedded in Xindice itself. I guess where there seems to be some confusion
is what "embedded in Xindice" means exactly.

First, lets differentiate structural vs. content validation.

   Structural Validation:  validates that the XML markup of a document is
     (a) well-formed, and optionally (if a schema is available and the 
      parser is set in validation mode), that 
     (b) the markup structure is valid according to the schema constraints.

   Content Validation: validates that the element and attribute content
     conforms to the constraints expressed in a schema (or perhaps written
     into the application itself).

I certainly understand and agree with your points, and myself have experience
similar to yours. Content validation is appropriate and often critical to
database applications. Structural validation is a rather unique need in
XML databases, since a relational or object database can't be corrupted by
incoming data (unless there's something strange in the database design).
My point (which I'm guessing was not expressed very clearly) is that any
Xindice-based application *must* have an XML parser available, and Xindice
is distributed with Xerces 2, which provides support for DTDs and XML
Schema. If you need stronger content validation, Xerces provides that with
its XML Schema support.

At most stages in the process an XML processor is *required* to handle the
XML content moving in and out of Xindice. All one needs to do to provide
stronger validation support during these processes is to establish those
parsers in validation mode, and provide the schemas necessary to validate
the content. As I mentioned in my previous message, you can even "pipeline
validate", which is to validate the content travelling between components
by validating the SAX events themselves. O'Reilly will soon be publishing
a SAX book by David Brownell (initial author of Sun's XML parser) that
describes this type of functionality.

You don't need to include validation features in Xindice itself because
the packages required to support Xindice already provide those features,
and any application built upon Xindice *by necessity* must parse and
process XML content. All XML content going into Xindice must at minimum
be well-formed XML -- that's structural validation at its most basic. If
further structural or content validation is needed, set the parser
factories to produce validating parsers, and then provide the schemas.
To put these features into Xindice itself would be redundant and
unnecessary. Xerces is already doing it.


I now more clearly understand where you're going.  However, I still have 
problems with the central assertion.

Before addressing the central issue, I wanted to clear up another point.  It is 
possible to break the referential integrity in a relational database (even with 
only in-bound data).  Take the example of two tables Department and Employee

Department
=========
ID
Name

Employee
=========
ID
DepartmentID
FirstName
LastName


Let's sprinkle a little data into our structure for better illustration.

Department
3514 Finance
3515 Research

Employee
19845   3515    Lorraine        Jacobson
19846   3514    Stephen Reed
19847   3515    Alexander       Morris
19848   3515    Phillip         Gutierrez


Without the appropriate foreign key constraints, it would be possible to insert 
new records into the Employee table with an invalid reference to the Department 
table (as shown here with the new employee Anton Azzameen

Employee
19845   3515    Lorraine        Jacobson
19846   3514    Stephen Reed
19847   3515    Alexander       Morris
19848   3515    Phillip         Gutierrez
19849   NULL    Anton           Azzameen

Now let's see this same example in XML

<organization>
        <employee>
                <department>Research</department>
                <firstName>Lorraine</firstName>
                <lastName>Jacobson</lastName>
        </employee>
        <employee>
                <department> Finance </department>
                <firstName> Stephen </firstName>
                <lastName> Reed </lastName>
        </employee>
        <employee>
                <department>Research</department>
                <firstName> Alexander </firstName>
                <lastName> Morris </lastName>
        </employee>
        <employee>
                <department>Research</department>
                <firstName> Phillip </firstName>
                <lastName> Gutierrez </lastName>
        </employee>
        <employee>
                <firstName> Anton </firstName>
                <lastName>Azzameen </lastName>
        </employee>
</organization>

The XML version of the example is well-formed but would be invalid (where the 
department is a required element of employee).  The relational version is 
likewise permissible (except where there exists a foreign key constraint).

Consequently, it should be more easily seen that structural validation is 
<emphasis>not</emphasis> unique to the XML world and is directly applicable 
elsewhere (i.e. the relational model).  However, all of this is tangential to 
the central issue of where validation should take place.


If I have correctly understood your argument, you saying that because the XML 
content must pass through an XML parser (which can be set to validate as well 
as parse), there is no need to embed this functionality in Xindice.  Basically, 
since XML parsing (and optionally validating) is required and is presently 
available outside of Xindice, why go to the effort to embed it within Xindice?

The answer is the same reason as before.  If Xindice is to become a "dumb" 
datastore (meaning that it relies on outside applications for validation of the 
data it stores), then what is there to prevent an application (which shares 
Xindice with other applications) from choosing a different parser (which varies 
in its ability to validate from that of Xerces used by the other applications) 
or even from choosing not to validate at all.

Again, it really boils down to whether or not all applications using Xindice 
can rely on the datastore to contain only valid data.

If there is an advantage to keeping the validating parser outside of Xindice, 
then fine.  However, once the parser and the validation mechanism is chosen for 
a collection, the use of the selected validating parser and mechanism must be 
bound to the collection so that Xindice does not break its contract with the 
applications that use it as the data store.  This tight-coupling of parser and 
mechanism (DTD, W3C Schema, Relax, etc.) certainly seems to imply that it 
should be embedded within Xindice.

MGM

RE: Future of Xindice

Reply via email to