We built a tool at one time for a specific application which would build B-tree indices onto an XML document, giving read-only access to what became many tables like a relational database. It tied together disparate legacy application systems. The addition of an SQL compiler and query engine would have made it into a functional read-only SQL RDBMS of rather limited utility.

The advantage of using XML transfers is that the entire DBMS can be mapped into one easily transmitted document which has a standard format and can be either reconstituted in full or cherry picked for data subsets by user implemented applications. If it is strictly RDBMS->RDBMS it makes more sense to encode the data as SQL than go to all the trouble of using XML and eventually generating SQL from it.
JS

A. Pagaltzis wrote:
* Steve O'Hara <[EMAIL PROTECTED]> [2006-05-07 11:20]:

This is the right approach, when I worked in the SGML world
with a component versioning system, we called it the
"non-linear" design.

By going down this road, your table schema is static and can
cope with any type of DTD without change.


That depends.

If you want to write a generic XML store, sure, this approach is
really the only way to implement such a thing on top of a
relational database.

However, it’s not really very relational, is it? You end up with
a database that you can’t reasonably query with JOINs and
aggregate functions. And in most cases I’ve seen, when people say
they want to dump XML documents into their database, they don’t
actually want to store an XML infoset in a table. Usually they
either only have one particular XML document structure their code
needs to cope with, ie the XML is just an exchange format (souped
up CSV), and need to scatter this data into an existing schema,
or they just store XML documents wholesale in a TEXT column.

If you really do want to store an XML infoset in a table, then
the outlined approach is fine, though you’re using the database
as a very flat store, running lots of very simple, dynamically
generated queries. The SQL frontend is mostly dead weight and you
might be better off just using some storage engine with a pure
function call API then. (BerkeleyDB’s B-tree API comes to mind,
though I haven’t actually used it.)


The next thing your tool needs to do, is to determine the
parent-child relationships between all the rows and express
this using primary key linking columns.


Or some other mechanism. The self-referential FK approach is only
one of many ways to represent trees in SQL, and wins mainly when
the bulk of your queries are INSERTs; in other scenarios, other
options will likely prevail.


As you can imagine, rebuilding the relationships isn't a simple
query - lots of self correlation etc.


Yeah, that’s the problem when retrieving hierarchical data
modelled using self-referrential FKs.

Regards,

Reply via email to