XIndice 2.0 [was Re: Data or Documents for Xindice 2.0]

Stefano Mazzocchi 4 Jan 2002 13:53:22 -0000

DISCLAIMER: personal and potentially inflammable opinions inside.

Kimbro Staken wrote:


<skip/>

> This is actually an important question that affects the overall
> development of Xindice into the future. When Tom and I were developing
> dbXML we definitely leaned in the direction of XML as data. This is why we
> don't really care about DTDs and such. Now we need to decide if that is
> the right thing to continue forward in the future or if a more XML
> document oriented perspective is in order.
> 
> The form of Xindice 1.0 is pretty much set, we've put down the ground work
> and presented one potential path. Now this project needs to decide what is
> the right path to move down from here. It certainly isn't a black and
> white situation, but we do need to try to get a clearer picture so that we
> have some guidelines to help with decisions like this.
> 
> This is really a question about how the server is being used today or more
> likely how it would be used if it did X, Y and Z.  What kind of
> applications are people building? What kind do you want to be building?

I see a native XML database as an incredibly great DBMS for
semi-structured data and an incredibly poor DBMS for structured data.

Corba? no thanks, I need WebDAV.

Joins? no thanks, I need document fragment aggregation.

XMLSchemas? no thanks, I need infoset-neutral RelaxNG validation.

If you have structured data, you can't beat the relational model. This
is the result of 50 years of database research: do we *really* believe
we are smarter/wiser/deeper-thinkers than all the people that worked on
the database industry since the 50's?

I personally don't.

Back to the point: didn't you ever had the feeling that LDAP was crap
but you couldn't find a better way to do those things?

Great, you smelled the problem.

Did you ever tried to store and quickly retrieve and compose the
fragments of *millions* of documents with a relational solution? 

are you still sane? lucky you.

But there is more: try to go to a swiss bank to convince them to install
a native XML DB instead of their relational one. Ok, let's aiming lower:
go to your financial department and convince them to move away from
their Oracle (or even from their Excel files, for &deity;'s sake!) with
a native XML DB.

The entire XML community is plagued with the 'data vs. document' match,
but this is *NOT* the problem: documents are data. Period. The fact that
you use the same syntax for both should make it clear already.

The *real* issue is "fully-structured vs. semi-structured" data.

Or, using more understandable terms: "table-oriented vs. tree-oriented"
data

                                     - o -

I see two big fields where XIndice can make a difference (and this is
the reason why I wanted this project to move under Apache in the first
place!):

 - web services
 - content management systems

Interesting enough, the two ASF members that pushed for this project to
happen (Sam and myself) push exactly in those directions, Sam for web
services, myself for CMS.

And if you think about it, these are exactly those realms where
table-oriented data fits very badly since almost all data is
tree-oriented (hierarchies of nodes).

IMO, an XML DB is nothing more than a mix between a filesystem++ and
LDAP and should try to replace those two: file systems for deeply nested
node clusters (otherwise called "semi-structured documents") and LDAP
for deeply nested single nodes (for example, user profiles)

Guess what: .NET will work on a native XML db exactly to provide a
storage system for those tree-style data (user profiles, passport data,
user pictures, email documents, etc..)

And guess what again: the most useful example of use of XIndice is as a
repository for Cocoon documents. Note that Cocoon already provides
hard-core technologies for adapting relational data to the XML world,
but users find XIndice much more attractive for their tree-oriented data
while remain loyal to their RDBMS for table-oriented data (and use
Cocoon to adapt the SQL queries to the XML world).

And note I didn't even touch the issues of legacy data, legacy SQL
knowledge, market inertia, complexity of the XML model, stupidity of the
XMLSchema spec, XML hype, etc, etc.

                                     - o -

This is my feature-list for XIndice 2.0:

 - one big tree with nodes flavor (following .NET blue/red nodes):
follows the design patterns of file systems with folders, files,
symlinks and such. [great would be the ability to dump the entire thing
as a huge namespaced XML file to allow easy backup and duplication]

 - node-granular and ACL-based authorization and security [great would
be the ability to make nodes 'transparent' for those people who don't
have access to see them]

 - file system-like direct access (WebDAV instead of useless XUpdate!)
[great for editing solutions since XUpdate requires the editor to get
the document, perform the diff and send the diff, while the same
operation can be performed by the server with one less connection, this
is what CVS does!]

 - internal aggregation of document fragments (the equivalent of file
system symlinks) [content aggregation at the database level will be much
faster than aggregation at the publishing level, very useful for content
that must be included in the same place... should replace the notion of
XML entities]

 - native metadata support (last modified time, author, etc..) [vital
for any useful caching system around the engine!]

 - node-granular event triggers [inverts the control of the database:
when something happens the database does something, useful mostly to
avoid expensive validity lookup for cached resources]

In short: I'd like to have a file system able to decompose XML documents
and store each single node as a file, scale to billions of nodes and
perform fast queries with XPath-like syntaxes.

This is my vision.

Now, with my years-old asbesto underwear on, I'll be ready for your
comments :)

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<[EMAIL PROTECTED]>                             Friedrich Nietzsche
--------------------------------------------------------------------

XIndice 2.0 [was Re: Data or Documents for Xindice 2.0]

Reply via email to