Re: XIndice 2.0 [was Re: Data or Documents for Xindice 2.0]

Tom Bradford 4 Jan 2002 15:56:11 -0000

Stefano Mazzocchi wrote:
> I see a native XML database as an incredibly great DBMS for
> semi-structured data and an incredibly poor DBMS for structured data.


I don't think anyone's debating that, though I wouldn't use the label
'incredibly poor' for structured data, especially since the definition
of what structured data is can't be answered by relational DBs
either...  I don't consider normalization and joins as being structure,
so much as I consider it to be a rigid decomposition of structure.

> Corba? no thanks, I need WebDAV.

As much as all of us hate it, CORBA absolutely has its uses.  We could
never get away with wire-compression if we were using a 'service the
world' WebDAV style approach.  Wire compression has bought us
performance gains, though not enough to justify keeping it exclusively.

> Joins? no thanks, I need document fragment aggregation.

In the context of XML, I think these are the same.

> XMLSchemas? no thanks, I need infoset-neutral RelaxNG validation.

Personally, and I'm just reiterating things I've said in the past, I
hate W3C XML Schemas, and many others do as well.  I don't want to have
to put ourselves in a position where we're forced to make a choice on
any one validation mechanism to the detriment of our users.  So if we
can continue to push validation to the client application, that's the
track we should take... for a couple of important reasons: (1)
Performance... validation is slow, Bogging down the server to perform it
can only cause problems, and (2) Choice: If we standardize on W3C
Schemas, then we exlude support for other schema specifications.  I
think that's unwise, especially with the major backlash that XML Schemas
has received.

> If you have structured data, you can't beat the relational model. This
> is the result of 50 years of database research: do we *really* believe
> we are smarter/wiser/deeper-thinkers than all the people that worked on
> the database industry since the 50's?

One might argue that the relational database industry hasn't learned
very much in the decades that it's been around.  Not that I'm saying XML
databases are better, but relational databases were created to solve the
problems of the databases of their time.  That time has passed.  There
are still a lot of applications that have the problem that relational
databases are trying to solve, but there are many applications that have
the problem that XML databases are trying to solve.  Further still,
there are apps that no database can adequately solve.

> I see two big fields where XIndice can make a difference (and this is
> the reason why I wanted this project to move under Apache in the first
> place!):
> 
>  - web services
>  - content management systems

Don't forget health care, legal documents, and scientific applications. 
These are three areas where Xindice has organically found a home in
since its creation.

>  - one big tree with nodes flavor (following .NET blue/red nodes):
> follows the design patterns of file systems with folders, files,
> symlinks and such. [great would be the ability to dump the entire thing
> as a huge namespaced XML file to allow easy backup and duplication]



>  - node-granular and ACL-based authorization and security [great would
> be the ability to make nodes 'transparent' for those people who don't
> have access to see them]
> 
>  - file system-like direct access (WebDAV instead of useless XUpdate!)
> [great for editing solutions since XUpdate requires the editor to get
> the document, perform the diff and send the diff, while the same
> operation can be performed by the server with one less connection, this
> is what CVS does!]

Woah!  Stop right there.  XUpdate is far from useless, and your
explaination of how it works, in the context of Xindice is incorrect. 
When you perform an XUpdate query, it's sent to the server which
performs all of the work.  Never is a document sent to the client except
for a summary of how many nodes were touched by the update.  It actually
performs very well, because you can modify every single document in a
collection, taking several different actions, with a single command.

>  - internal aggregation of document fragments (the equivalent of file
> system symlinks) [content aggregation at the database level will be much
> faster than aggregation at the publishing level, very useful for content
> that must be included in the same place... should replace the notion of
> XML entities]

We have this functionality in a very experimental form.  It's called
AutoLinking.  It's been around for a while, but it's going away at some
point, to be replaced by XQuery.  The problem with it is that you have
to modify the structure of your XML content, so it can't be treated as
data.  XQuery will allow this aggregation using the data in the
documents rather than instructions within the document.  Beyond that,
there's nothing stopping somebody from using XLink, its just not a task
that the server will perform because of the passive nature of XLinks.

>  - native metadata support (last modified time, author, etc..) [vital
> for any useful caching system around the engine!]

Some of this is already available, there's no way to expose it currently
though.

>  - node-granular event triggers [inverts the control of the database:
> when something happens the database does something, useful mostly to
> avoid expensive validity lookup for cached resources]

We talked about this early on in developing the product, but decided to
put it on a back burner for a while... probably for the same reason we
decided to shelve any specification validation system.

> In short: I'd like to have a file system able to decompose XML documents
> and store each single node as a file, scale to billions of nodes and
> perform fast queries with XPath-like syntaxes.

This is not to far from where we are at the moment.  Nodes are
individually addressable, but we cluster them into Documents for
atomicity, much like an object database will cluster objects together in
a way that ensures optimal I/O performance.
 
> This is my vision.

Now if this can work within the framework of my vision then nobody'll
get hurt. :-)

> Now, with my years-old asbesto underwear on, I'll be ready for your
> comments :)

-- 
Tom Bradford - http://www.tbradford.org
Developer - Apache Xindice (formerly dbXML)

Re: XIndice 2.0 [was Re: Data or Documents for Xindice 2.0]

Reply via email to