Re: [whatwg] XML databases, XML syntax and HTML5

2006-12-09 Thread Elliotte Harold

Robert Sayre wrote:


says who?



Says me. Says all the vendors who have put their capital into native XML 
databases and not into native HTML databases.


One presumes a theoretical HTML database would support HTML. An XML 
database supports that plus all the other uses of XML.


Most current projects I've seen or heard of are in large publishing 
firms: i.e. the size of O'Reilly books are larger. 15 gigabytes of text 
is small for these systems, though it's huge for most web sites. I 
suspect it's just that this is where the money and the need lies for the 
moment. The same was true of SQL in its early days. However, I expect 
native XML databases to go down market fairly quickly over the next few 
years.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/


Re: [whatwg] XML databases, XML syntax and HTML5

2006-12-09 Thread Elliotte Harold

Benjamin Hawkes-Lewis wrote:


Can I ask some really basic questions about this? (Jason Hunter's talk
didn't appear to be online.) Are there Exist equivalents for Python,
PHP, and Ruby programmers, or do we all need to use Java in the brave
new world?


For eXist I don't know. However the major payware systems (Db 2 9, Mark 
Logic, Oracle, etc.) all are essentially language neutral, just as SQL 
databases are. They have bindings for all major languages.


Open source offerings are a little thin on the ground right now, and not 
as robust as I'd like, but that's going to change, possibly by the end 
of next year if not sooner.



Is the theory here that the entire site's content goes into one XML
file? 


No. We're using a database, not a file system. XML documents go into the 
database, but there are many such documents. One per page, one per 
sidebar, one per comment, whatever. You can organize the content to fit 
your site.


And, if so, what happens when it gets big? 


If that happens, you're a lot better off with a database store than a 
filesystem. The databases can vastly outperform file system based 
XSLT/XQuery processors like Saxon for large files due to indexing, by an 
order of magnitude or more. It's probably not really relevant for web 
sites, where the documents just aren't just that big.  But if you're 
working on airplane technical manuals and encyclopedias and such, the 
difference is very impressive.



Or would you have
different XML files for each article, comment, and user? What happens to
hypermedia like videos, images, and audio? 


In hybrid systems like DB 2 9, you'd probably put some of that data, 
like the users, in a classic SQL table. Binary data can be stored in the 
file system (like WordPress does today) or as a BLOB in the database. 
Some of these details do change from one database to the next. For 
instance, Mark Logic allows you to store plain text, blobs, and XML 
documents right in the database, with XQuery extensions for manipulating 
this stuff.



Are such systems going to be as simple for
end-users to install on their servers as WordPress?


Depends on who writes them, I suppose. eXist replaces MySQL, not 
WordPress. No reason such a system can't be equally easy to install.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/


Re: [whatwg] XML databases, XML syntax and HTML5

2006-12-09 Thread Benjamin Hawkes-Lewis
Elliotte Harold wrote:

> However, after spending the last few days at XML 2006, I have a new
> perspective on such systems I didn't have a week ago. In particular I
> now believe that the relational databases that back these sites are
> fundamentally the wrong technology. As Mark Logic's Jason Hunter put
> it, they're trying to force triangles into rectangle shaped holes.
> 
> I understand why relational databases were used to build blog engines 
> and content management systems. For a long time that was all we had.
> However, that's going to change fast. I expect that new systems are
> going to be developed using pure and hybrid XML databases like Exist
> and DB2 9. The advantages to a programmer working on such systems
> are just too compelling to ignore.

Can I ask some really basic questions about this? (Jason Hunter's talk
didn't appear to be online.) Are there Exist equivalents for Python,
PHP, and Ruby programmers, or do we all need to use Java in the brave
new world?

> One consequence of building on top of native XML database rather than a 
> relational database is that well-formedness is going to become more 
> important, not less. In fact, well-formedness is going to become 
> essential because these systems cannot store anything less than a fully 
> well-formed XML document. I predict that this, if nothing else, is going 
> to convince blog engines and content management systems to start fixing 
> up malformed content before storing it. Maybe all the legacy systems 
> won't convert, but the new ones most certainly will.

Is the theory here that the entire site's content goes into one XML
file? And, if so, what happens when it gets big? Or would you have
different XML files for each article, comment, and user? What happens to
hypermedia like videos, images, and audio? Is there a walkthrough of
creating a hypermedia system with Exist online? (A brief search of the
Exist website didn't turn one up, although I see it includes a module
for resizing images.) Are such systems going to be as simple for
end-users to install on their servers as WordPress?

> One consequence of building on top of native XML database rather than
> a relational database is that well-formedness is going to become more
> important, not less. In fact, well-formedness is going to become
> essential because these systems cannot store anything less than a
> fully well-formed XML document. I predict that this, if nothing else,
> is going to convince blog engines and content management systems to
> start fixing up malformed content before storing it. Maybe all the
> legacy systems won't convert, but the new ones most certainly will.

To which Rimantas Liubertas replied:
> 
> In other words this means, that John Doe will have more headache with
> such system, which will be pretty compelling reason to stick with
> "old-fashioned" RDBMs which "just work".

I really can't see how this follows from what Elliotte was saying. Is
John Doe a programmer or an end-user? Plenty of programmers seem to have
difficulties with databases of all stripes. And I can't see why the
end-user would be worrying about well-formedness with such a CMS any
more than they worry about well-formedness when using OpenOffice.org or
Microsoft Word.

--
Benjamin Hawkes-Lewis



Re: [whatwg] XML databases, XML syntax and HTML5

2006-12-09 Thread Rimantas Liubertas

<...> I understand why relational databases were used to build blog engines

and content management systems. For a long time that was all we had.
However, that's going to change fast. I expect that new systems are
going to be developed using pure and hybrid XML databases like Exist and
  DB2 9. The advantages to a programmer working on such systems are just
too compelling to ignore.


Or maybe programmers use these database systems because they work,
and work good enough?


One consequence of building on top of native XML database rather than a
relational database is that well-formedness is going to become more
important, not less. In fact, well-formedness is going to become
essential because these systems cannot store anything less than a fully
well-formed XML document. I predict that this, if nothing else, is going
to convince blog engines and content management systems to start fixing
up malformed content before storing it. Maybe all the legacy systems
won't convert, but the new ones most certainly will.


In other words this means, that John Doe will have more headache with
such system, which will be pretty compelling reason to stick with
"old-fashioned"
RDBMs which "just work".
In short - I don't share your vision of the bright future of XML DBMS.
Time will show.

Regards,
Rimantas
--
http://rimantas.com/


Re: [whatwg] XML databases, XML syntax and HTML5

2006-12-08 Thread Robert Sayre

On 12/8/06, Elliotte Harold <[EMAIL PROTECTED]> wrote:

Robert Sayre wrote:

> I disagree. Wouldn't it be more profitable to build an HTML database?
>

No. XML databases are a lot more general purpose and support many more
use cases.


says who?

--

Robert Sayre

"I would have written a shorter letter, but I did not have the time."


Re: [whatwg] XML databases, XML syntax and HTML5

2006-12-08 Thread Elliotte Harold

Robert Sayre wrote:


I disagree. Wouldn't it be more profitable to build an HTML database?



No. XML databases are a lot more general purpose and support many more 
use cases.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/


Re: [whatwg] XML databases, XML syntax and HTML5

2006-12-08 Thread Robert Sayre

On 12/8/06, Elliotte Harold <[EMAIL PROTECTED]> wrote:


One consequence of building on top of native XML database rather than a
relational database is that well-formedness is going to become more
important, not less.


I disagree. Wouldn't it be more profitable to build an HTML database?

--

Robert Sayre