Elliotte Harold wrote:
Kenneth Downs wrote:
Select title
,SUBSTRING(text ...insert regexp here...)
from chapters
where book_name = 'XML in a Nutshell'
Regexps can't do that though. Regular expression are an insufficiently
powerful tool for processing XML. Trying to do that is just a world of
pain.
????
The example shows a query of a table, not XML. The purpose is to
demonstrate with a quick snippet that all examples of a supposed
indispensable need for the "XML Database" stem from an ignorance of the
abilities of other tools.
Say that you prefer XML, say that you like it, say that you are used to
using it, but don't say that it is a fundamental requirement of the data
itself because it just ain't so.
Rusty, you appear to be arguing from ignorance, very unusual coming
from you.
Funny how you confuse different experiences with ignorance. Have you
ever worked in publishing? Or in library science? Or on anything that
operates at web scale like Yahoo or Google? There are many use cases
where a couple of months of hard labor will rapidly disabuse anyone of
the belief that relational databases are the one true solution to all
problems. Your career just happens not to have taken you down those
paths yet.
My observation on your arguments stems from your repeatedly ignoring
obvious examples of where tables do just fine to store data, and the
claim that 80% of the world's apps need an XML database.
If you have gotten used to using XML for text, then say so. If you like
it, then say so. Don't say it is the only tool available because it is
not. It has many very serious drawbacks, verbosity being the very
first, not to mention the confounding of structure and implementation,
encouraging the illusion of "structureless" data, and so on.
The true difference between us in this argument is that I understand
that I have a prejudice for relational over hierarchical, based on my
knowledge and use of both, and based on judgment calls as to how to
get through the day. I daresay however that you are promoting a
religious favoring of XML w/o a working knowledge of the alternatives.
Ken, you know me. Do you really think I don't know the relational
model or what it's good for? I use relational databases all the time,
and I'm using them now. However unlike you I've hit their limits.
While I'm sure many people can profitably spend their life doing
nothing but relational databases, I happen to be working on
applications where neither the relational model nor the actual SQL
databases out there can come close to managing my data. I've never
said that all applications should use XML databases or other
non-relational systems, You keep trying to put those words into my
mouth. I do say that some applications, especially in publishing and
web publishing, do not fit the relational model well and can better
served by XML databases.
I do know you, and that is why I was struck by your pro-XML stance for
"80% of applications", in which you must either be ignorant of what most
applications really need, or what modern RDBMS's can do, or both.
Forget about EF Codd and the relational model for a moment, lets just
look at the real products that have come along, the table-based servers
we call RDBMS's. These have all solved the very basic issues of data
storage. Most of their power comes from so-called "ACID" compliance,
the ability to allow multiple simultaneous users to access a data store
with assurances of predictable behavior. Your XML databases must solve
these same issues.
What about security? The modern RDBMS defines security on all objects.
Your XML databases will have to provide the ability to define security
on the complete tree. (By the way, I'm sure they'll get there, just keep
reading).
But there is one aspect of the relational model where XML, as a format,
takes a huge leap backward. Codd realized the incredible productivity
gains that could be had if a programmer could access data by name and
not worry about its internal storage structure. He separated the
implementation from the interface. XML, as a format (file, data,
whatever), confounds these two. It is a verbose format for hierarchical
data. There are better formats for nearly all uses.
Here's the clincher. Let's say the XML database grows up and has all of
these things. On this day the only thing it will have in common with
XML is a hierarchical model, the XML format itself will be the first to
go. The ability to accept XQuery statements will be a historical
footnote, and people will end up hating XQuery as much as they hate SQL
(everybody's least favorite part of the RDBMS world). These databases
will end up supporting output formats as YAML, JSON, and others, and
probably inputs as well. There is just not a lot in the XML format that
really makes up data storage.
We can thank XML for making us conscious of the ubiquitous need for
hierarchical data. I use it all of the time. Personally I store my
database definitions in YAML, a hierarchical data format that is human
readable/writable (unlike XML) as well as machine readable-writable.
My programs return hierarchical data from AJAX requests as JSON, because
that's what the browser works best with, and all of my PHP programs
handle all data universally as associative arrays, which are just
hierarchical data in yet another disguise. I love hierarchies, but have
not use for a format that is not human readable/writable, which is
incredibly verbose, and which
So when I say you are arguing from ignorance, I am saying that you are
generalizing your own experience with heavy-duty text management, and
since you have never mentioned any of the topics above, you may not have
the entire picture.
Now, to your point about my own limited experience, I picked a path some
years ago that has made me an expert in some areas and ignorant in
others. But I don't go claiming that "80% of the worlds applications
cannot use RDBMS". In fact, the examples you raise are all examples of
text management. This is a new area that the RDBMS was never intended
to solve. Many people have found it easily possible to extend the RDBMS
in a few areas, but others (such as you) are saying we need to start
over. But it is amusing that the look-again crowd has started over with
hierarchical data. In the end it won't be the format that is used, but
the basic abilities to manage and store text. I submit that the clear
solution has yet to emerge from that pursuit.
You simply cannot defend a file format as a foundation for frameworks
and databases. The best you can do is defend the model, such as the
hierarchical model.
XML is not a file format. We've been down this road before. A native
XML database is no more based ona file format than MySQL is based on
tab delimited text.
But you are not saying what it is based upon. My statements above about
ACID compliance, security, and separation of implementation from
interface provide a basis for a database. The structure of the data is
given by tables. This makes a complete system.
If you cannot provide the basis for the entire picture of data
management, we are left with what the XML books tell me: how to format
the file.
Going further, you cannot defend a file format as a foundation for
anything based on how it handles large text (or binary) fields.
There are three issues here:
-> Data model, hierarchical vs. relational. -> File format, XML vs
YAML or JSON or any other format you like
-> Handling of large text (and binary) columns.
Finally, if we can all admit that XML is just a file format, then the
entire framework crumbles as soon as somebody comes up with a better
one, because let's admit it, XML is just about the worst you're going
to find.
Troll. Troll. Troll.
???? Geez Rusty, come on. My conclusion is worded harshly yes, but do
you really label as a troll a description of the larger issues of
formats, data models, and everything else that makes up the larger picture?
In conclusion, the examples you provide appear to give advantage to
XML because tools exist to handle data that has been buried in opaque
formats and poorly defined structures. If the data had been
structured properly in the first place and put into formats that were
not so opaque, using (pardon me for saying) a *real* database,
designed on solid principles, the examples you give become child's play.
LOL. Seriously, try storing a book or an encyclopedia in a relational
database with anything approximating 1NF, not even 2NF. Then try and
make it perform adequately.
Not all data fits neatly into tables.
Actually most data does not, not at first glance. But since a table is
simply a mapping of properties to entities, it turns out that most data
does when you look at it closely. It takes about the same effort as
deciding upon a set of tags, since it is of course exactly the same process.
The crucial question is, does your book have structure? Can you make up
tags as you go or are you limited to a pre-defined set, such as Docbook?
Once you commit to a specific set of tags, you have committed to a
structure, and you may as well use tables as anything else. Methinks
however that at this point it comes down to what you are comfortable
with. If you want to use XML, go for it, if you want to use tables, go
for it, just don't confuse the structure of the data with a fundamental
need for either system.
--
Kenneth Downs
Secure Data Software, Inc.
www.secdat.com www.andromeda-project.org
631-689-7200 Fax: 631-689-0527
cell: 631-379-0010
_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk
NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com
Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php