Elliotte Harold wrote:
Kenneth Downs wrote:

Select title
          ,SUBSTRING(text ...insert regexp here...)
  from chapters
 where book_name = 'XML in a Nutshell'


Regexps can't do that though. Regular expression are an insufficiently powerful tool for processing XML. Trying to do that is just a world of pain.

????

The example shows a query of a table, not XML. The purpose is to demonstrate with a quick snippet that all examples of a supposed indispensable need for the "XML Database" stem from an ignorance of the abilities of other tools.

Say that you prefer XML, say that you like it, say that you are used to using it, but don't say that it is a fundamental requirement of the data itself because it just ain't so.


Rusty, you appear to be arguing from ignorance, very unusual coming from you.

Funny how you confuse different experiences with ignorance. Have you ever worked in publishing? Or in library science? Or on anything that operates at web scale like Yahoo or Google? There are many use cases where a couple of months of hard labor will rapidly disabuse anyone of the belief that relational databases are the one true solution to all problems. Your career just happens not to have taken you down those paths yet.

My observation on your arguments stems from your repeatedly ignoring obvious examples of where tables do just fine to store data, and the claim that 80% of the world's apps need an XML database.

If you have gotten used to using XML for text, then say so. If you like it, then say so. Don't say it is the only tool available because it is not. It has many very serious drawbacks, verbosity being the very first, not to mention the confounding of structure and implementation, encouraging the illusion of "structureless" data, and so on.


The true difference between us in this argument is that I understand that I have a prejudice for relational over hierarchical, based on my knowledge and use of both, and based on judgment calls as to how to get through the day. I daresay however that you are promoting a religious favoring of XML w/o a working knowledge of the alternatives.

Ken, you know me. Do you really think I don't know the relational model or what it's good for? I use relational databases all the time, and I'm using them now. However unlike you I've hit their limits. While I'm sure many people can profitably spend their life doing nothing but relational databases, I happen to be working on applications where neither the relational model nor the actual SQL databases out there can come close to managing my data. I've never said that all applications should use XML databases or other non-relational systems, You keep trying to put those words into my mouth. I do say that some applications, especially in publishing and web publishing, do not fit the relational model well and can better served by XML databases.

I do know you, and that is why I was struck by your pro-XML stance for "80% of applications", in which you must either be ignorant of what most applications really need, or what modern RDBMS's can do, or both. Forget about EF Codd and the relational model for a moment, lets just look at the real products that have come along, the table-based servers we call RDBMS's. These have all solved the very basic issues of data storage. Most of their power comes from so-called "ACID" compliance, the ability to allow multiple simultaneous users to access a data store with assurances of predictable behavior. Your XML databases must solve these same issues.

What about security? The modern RDBMS defines security on all objects. Your XML databases will have to provide the ability to define security on the complete tree. (By the way, I'm sure they'll get there, just keep reading).

But there is one aspect of the relational model where XML, as a format, takes a huge leap backward. Codd realized the incredible productivity gains that could be had if a programmer could access data by name and not worry about its internal storage structure. He separated the implementation from the interface. XML, as a format (file, data, whatever), confounds these two. It is a verbose format for hierarchical data. There are better formats for nearly all uses.

Here's the clincher. Let's say the XML database grows up and has all of these things. On this day the only thing it will have in common with XML is a hierarchical model, the XML format itself will be the first to go. The ability to accept XQuery statements will be a historical footnote, and people will end up hating XQuery as much as they hate SQL (everybody's least favorite part of the RDBMS world). These databases will end up supporting output formats as YAML, JSON, and others, and probably inputs as well. There is just not a lot in the XML format that really makes up data storage.

We can thank XML for making us conscious of the ubiquitous need for hierarchical data. I use it all of the time. Personally I store my database definitions in YAML, a hierarchical data format that is human readable/writable (unlike XML) as well as machine readable-writable. My programs return hierarchical data from AJAX requests as JSON, because that's what the browser works best with, and all of my PHP programs handle all data universally as associative arrays, which are just hierarchical data in yet another disguise. I love hierarchies, but have not use for a format that is not human readable/writable, which is incredibly verbose, and which

So when I say you are arguing from ignorance, I am saying that you are generalizing your own experience with heavy-duty text management, and since you have never mentioned any of the topics above, you may not have the entire picture.

Now, to your point about my own limited experience, I picked a path some years ago that has made me an expert in some areas and ignorant in others. But I don't go claiming that "80% of the worlds applications cannot use RDBMS". In fact, the examples you raise are all examples of text management. This is a new area that the RDBMS was never intended to solve. Many people have found it easily possible to extend the RDBMS in a few areas, but others (such as you) are saying we need to start over. But it is amusing that the look-again crowd has started over with hierarchical data. In the end it won't be the format that is used, but the basic abilities to manage and store text. I submit that the clear solution has yet to emerge from that pursuit.

You simply cannot defend a file format as a foundation for frameworks and databases. The best you can do is defend the model, such as the hierarchical model.

XML is not a file format. We've been down this road before. A native XML database is no more based ona file format than MySQL is based on tab delimited text.

But you are not saying what it is based upon. My statements above about ACID compliance, security, and separation of implementation from interface provide a basis for a database. The structure of the data is given by tables. This makes a complete system.

If you cannot provide the basis for the entire picture of data management, we are left with what the XML books tell me: how to format the file.


Going further, you cannot defend a file format as a foundation for anything based on how it handles large text (or binary) fields. There are three issues here:

-> Data model, hierarchical vs. relational. -> File format, XML vs YAML or JSON or any other format you like
-> Handling of large text (and binary) columns.

Finally, if we can all admit that XML is just a file format, then the entire framework crumbles as soon as somebody comes up with a better one, because let's admit it, XML is just about the worst you're going to find.

Troll. Troll. Troll.

???? Geez Rusty, come on. My conclusion is worded harshly yes, but do you really label as a troll a description of the larger issues of formats, data models, and everything else that makes up the larger picture?



In conclusion, the examples you provide appear to give advantage to XML because tools exist to handle data that has been buried in opaque formats and poorly defined structures. If the data had been structured properly in the first place and put into formats that were not so opaque, using (pardon me for saying) a *real* database, designed on solid principles, the examples you give become child's play.

LOL. Seriously, try storing a book or an encyclopedia in a relational database with anything approximating 1NF, not even 2NF. Then try and make it perform adequately.

Not all data fits neatly into tables.


Actually most data does not, not at first glance. But since a table is simply a mapping of properties to entities, it turns out that most data does when you look at it closely. It takes about the same effort as deciding upon a set of tags, since it is of course exactly the same process.

The crucial question is, does your book have structure? Can you make up tags as you go or are you limited to a pre-defined set, such as Docbook? Once you commit to a specific set of tags, you have committed to a structure, and you may as well use tables as anything else. Methinks however that at this point it comes down to what you are comfortable with. If you want to use XML, go for it, if you want to use tables, go for it, just don't confuse the structure of the data with a fundamental need for either system.

--
Kenneth Downs
Secure Data Software, Inc.
www.secdat.com    www.andromeda-project.org
631-689-7200   Fax: 631-689-0527
cell: 631-379-0010

_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com

Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php

Reply via email to