Re: XML vs. Table DBs [was: Re: [nyphp-talk] Many pages: one script]

Kenneth Downs Wed, 29 Aug 2007 04:53:37 -0700

Elliotte Harold wrote:

Kenneth Downs wrote:
Select title
          ,SUBSTRING(text ...insert regexp here...)
  from chapters
 where book_name = 'XML in a Nutshell'
Regexps can't do that though. Regular expression are an insufficientlypowerful tool for processing XML. Trying to do that is just a world ofpain.


????

The example shows a query of a table, not XML. The purpose is todemonstrate with a quick snippet that all examples of a supposedindispensable need for the "XML Database" stem from an ignorance of theabilities of other tools.

Say that you prefer XML, say that you like it, say that you are used tousing it, but don't say that it is a fundamental requirement of the dataitself because it just ain't so.

Rusty, you appear to be arguing from ignorance, very unusual comingfrom you.
Funny how you confuse different experiences with ignorance. Have youever worked in publishing? Or in library science? Or on anything thatoperates at web scale like Yahoo or Google? There are many use caseswhere a couple of months of hard labor will rapidly disabuse anyone ofthe belief that relational databases are the one true solution to allproblems. Your career just happens not to have taken you down thosepaths yet.

My observation on your arguments stems from your repeatedly ignoringobvious examples of where tables do just fine to store data, and theclaim that 80% of the world's apps need an XML database.

If you have gotten used to using XML for text, then say so. If you likeit, then say so. Don't say it is the only tool available because it isnot. It has many very serious drawbacks, verbosity being the veryfirst, not to mention the confounding of structure and implementation,encouraging the illusion of "structureless" data, and so on.

The true difference between us in this argument is that I understandthat I have a prejudice for relational over hierarchical, based on myknowledge and use of both, and based on judgment calls as to how toget through the day. I daresay however that you are promoting areligious favoring of XML w/o a working knowledge of the alternatives.
Ken, you know me. Do you really think I don't know the relationalmodel or what it's good for? I use relational databases all the time,and I'm using them now. However unlike you I've hit their limits.While I'm sure many people can profitably spend their life doingnothing but relational databases, I happen to be working onapplications where neither the relational model nor the actual SQLdatabases out there can come close to managing my data. I've neversaid that all applications should use XML databases or othernon-relational systems, You keep trying to put those words into mymouth. I do say that some applications, especially in publishing andweb publishing, do not fit the relational model well and can betterserved by XML databases.

I do know you, and that is why I was struck by your pro-XML stance for"80% of applications", in which you must either be ignorant of what mostapplications really need, or what modern RDBMS's can do, or both.Forget about EF Codd and the relational model for a moment, lets justlook at the real products that have come along, the table-based serverswe call RDBMS's. These have all solved the very basic issues of datastorage. Most of their power comes from so-called "ACID" compliance,the ability to allow multiple simultaneous users to access a data storewith assurances of predictable behavior. Your XML databases must solvethese same issues.

What about security? The modern RDBMS defines security on all objects.Your XML databases will have to provide the ability to define securityon the complete tree. (By the way, I'm sure they'll get there, just keepreading).

But there is one aspect of the relational model where XML, as a format,takes a huge leap backward. Codd realized the incredible productivitygains that could be had if a programmer could access data by name andnot worry about its internal storage structure. He separated theimplementation from the interface. XML, as a format (file, data,whatever), confounds these two. It is a verbose format for hierarchicaldata. There are better formats for nearly all uses.

Here's the clincher. Let's say the XML database grows up and has all ofthese things. On this day the only thing it will have in common withXML is a hierarchical model, the XML format itself will be the first togo. The ability to accept XQuery statements will be a historicalfootnote, and people will end up hating XQuery as much as they hate SQL(everybody's least favorite part of the RDBMS world). These databaseswill end up supporting output formats as YAML, JSON, and others, andprobably inputs as well. There is just not a lot in the XML format thatreally makes up data storage.

We can thank XML for making us conscious of the ubiquitous need forhierarchical data. I use it all of the time. Personally I store mydatabase definitions in YAML, a hierarchical data format that is humanreadable/writable (unlike XML) as well as machine readable-writable.My programs return hierarchical data from AJAX requests as JSON, becausethat's what the browser works best with, and all of my PHP programshandle all data universally as associative arrays, which are justhierarchical data in yet another disguise. I love hierarchies, but havenot use for a format that is not human readable/writable, which isincredibly verbose, and which

So when I say you are arguing from ignorance, I am saying that you aregeneralizing your own experience with heavy-duty text management, andsince you have never mentioned any of the topics above, you may not havethe entire picture.

Now, to your point about my own limited experience, I picked a path someyears ago that has made me an expert in some areas and ignorant inothers. But I don't go claiming that "80% of the worlds applicationscannot use RDBMS". In fact, the examples you raise are all examples oftext management. This is a new area that the RDBMS was never intendedto solve. Many people have found it easily possible to extend the RDBMSin a few areas, but others (such as you) are saying we need to startover. But it is amusing that the look-again crowd has started over withhierarchical data. In the end it won't be the format that is used, butthe basic abilities to manage and store text. I submit that the clearsolution has yet to emerge from that pursuit.

You simply cannot defend a file format as a foundation for frameworksand databases. The best you can do is defend the model, such as thehierarchical model.
XML is not a file format. We've been down this road before. A nativeXML database is no more based ona file format than MySQL is based ontab delimited text.

But you are not saying what it is based upon. My statements above aboutACID compliance, security, and separation of implementation frominterface provide a basis for a database. The structure of the data isgiven by tables. This makes a complete system.

If you cannot provide the basis for the entire picture of datamanagement, we are left with what the XML books tell me: how to formatthe file.

Going further, you cannot defend a file format as a foundation foranything based on how it handles large text (or binary) fields.There are three issues here:
-> Data model, hierarchical vs. relational. -> File format, XML vsYAML or JSON or any other format you like
-> Handling of large text (and binary) columns.
Finally, if we can all admit that XML is just a file format, then theentire framework crumbles as soon as somebody comes up with a betterone, because let's admit it, XML is just about the worst you're goingto find.
Troll. Troll. Troll.

???? Geez Rusty, come on. My conclusion is worded harshly yes, but doyou really label as a troll a description of the larger issues offormats, data models, and everything else that makes up the larger picture?

In conclusion, the examples you provide appear to give advantage toXML because tools exist to handle data that has been buried in opaqueformats and poorly defined structures. If the data had beenstructured properly in the first place and put into formats that werenot so opaque, using (pardon me for saying) a *real* database,designed on solid principles, the examples you give become child's play.
LOL. Seriously, try storing a book or an encyclopedia in a relationaldatabase with anything approximating 1NF, not even 2NF. Then try andmake it perform adequately.
Not all data fits neatly into tables.

Actually most data does not, not at first glance. But since a table issimply a mapping of properties to entities, it turns out that most datadoes when you look at it closely. It takes about the same effort asdeciding upon a set of tags, since it is of course exactly the same process.

The crucial question is, does your book have structure? Can you make uptags as you go or are you limited to a pre-defined set, such as Docbook?Once you commit to a specific set of tags, you have committed to astructure, and you may as well use tables as anything else. Methinkshowever that at this point it comes down to what you are comfortablewith. If you want to use XML, go for it, if you want to use tables, gofor it, just don't confuse the structure of the data with a fundamentalneed for either system.


--
Kenneth Downs
Secure Data Software, Inc.
www.secdat.com    www.andromeda-project.org
631-689-7200   Fax: 631-689-0527
cell: 631-379-0010

_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com

Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php

Re: XML vs. Table DBs [was: Re: [nyphp-talk] Many pages: one script]

Reply via email to