Re: XIndice 2.0 [was Re: Data or Documents for Xindice 2.0]

Kimbro Staken 7 Jan 2002 11:15:52 -0000

On Saturday, January 5, 2002, at 04:01 AM, Stefano Mazzocchi wrote:

My points was not to remove CORBA from the picture (BTW, is there
anybody here who is usign XIndice from CORBA in a real-life
application?) but to indicate my impression that time spent on a webDAV
connection would have been better spent. No offense intended, just a
consideration from the document-oriented world where CORBA will never
even enter.

Everybody who uses the XML:DB API uses CORBA behind the scenes, which basically means everybody is using it. I don't know of anyone using the CORBA API directly and I wouldn't encourage anyone to do so since we want to get rid of CORBA. Now getting rid of CORBA does not mean getting rid of that layer. CORBA provides an essential function to the server and that function could not be entirely fulfilled by webdav.

While webdav would be nice for document oriented applications, dbXML was not really designed or conceived for those applications nor has the majority of the interest in the server been for those types of applications. This isn't to deny that both webdav and document oriented applications are important, it is to deny that they are the only applications that should be targeted. I'm all for adding webdav as an option, but you're wrong in saying that our time would have been better spent there. In fact you are the only person who has ever "really" wanted webdav. It had come up in the past but it was never a real solid request from any user of the software. Now it is.

That's a good point, but again, I'm questioning the darwinistic
evolutionary process of this effort: do what people ask, not what
architectural elegance suggests or W3C recommends.

And we've had far more requests for W3C XML Schema then for Relax NG. I'm not a fan of XML Schema either, doesn't change the fact that it is what is being asked for. I'm with Tom though, if we can do things in a schema language independent manner that should be the target.


I agree with you on the fact that the engine internals should deal with
validation. Just like Cocoon doesn't validate stuff by default.

Let's not get too caught on just focusing on validation here. Validation is just part of the schema equation. There's also the data-typing issue to consider. This will be particularly important with XQuery. In fact I'd say data typing is even more important then validation for data oriented apps, but you can't really apply types without the structure of the document being known. This means some level of schema support has to be built into the server.

Just to be clear, in no way am I suggesting that the server should "require" a schema. In fact I'd consider requiring schemas to be destroying what I value most about the server.

I agree it would be cool if validation could be done at either client or server under the control of the developer. For data oriented apps having robust schema support on the server will be essential though.

The content management system I'd like to have could be build in two
ways:

 1) single layer: XIndice includes all the required functionality.

 2) double layer: XIndice is the db engine, something else wraps it and
performs CMS operations like access control, workflow management, data
validation, versioning, etc.

Separation of Concerns clearly indicates that the second option is the
best. This has been my view of the issues since May 2000, when I first
took a serious look to dbXML as the engine for such a system.


Yes number 2 is clearly the way to go.

This is why I wanted XIndice over to Apache: separation of concerns is a
great way to do parallel design and increase productivity and give users
more choice, but it can't work without *solid* contracts between the
systems that interoperate.

So, what I'm asking, is *NOT* to turn XIndice into a CMS, not at all!


Good, because I certainly wouldn't agree with that.

What I'd like to see is XIndice remaining *very* abstract on the XML
model, but without sacrificing performance and making it possible to
implement more complex systems on top.


Absolutely, that's the whole point. Xindice is about flexibility.


Absolutely. Still, please, let's try to avoid a pissing contest with the
RDBMS communities and lead the way for those grounds where the relation
model fits, but with a very bad twist.

I agree, I don't want to get into this battle either. However, that doesn' t mean that an XML database is not useful in data oriented applications. The simple fact that you have semi-structured data is incredibly valuable for many applications that are nothing like a CMS. They're still data oriented applications though. Just by building a database it doesn't automatically mean that you have to suddenly start chanting "death to RDBMS".

 - web services
 - content management systems
Don't forget health care, legal documents, and scientific applications.
These are all examples of the above two.

Heh, heh, there is no way that I'll buy into the idea that the only two places where Xindice is useful are web services and CMS. There's more to XML data management then that.

XUpdate is a way express deltas, differences between trees.

In the data-centric world, people are used to send deltas: change this
number with this other one, append this new address, remove this credit
card from the valid list.

In the document-centric world, people are used to think of files, not
about their diffs.

CVS is a great system because does all the differential processing on
documents by itself, transparently.

Now, the use of a delta-oriented update language isn't necessarely bad
as a 'wire-transport' (much like CVS sends compressed diff between the
client and the server) but definately isn't useful by itself without
some application level adaptation.

Now, let me give you a scenario I'd like to see happening: imagine to
have this CMS system implemented and you provide a WebDAV view of your
database.

You connect to this 'web folder' (both Windows, Linux and MacOSX come
with the ability to mount webdav hosts as they were file system
folders), you browse it and you save your file from your favorite XML
editor (or even using stuff like Adobe Illustrator for SVG).

The CMS will control your accessibility (after authentication or using
client side certification, whatever), perform the necessary steps
defined on that folder by the workflow configurations (for example,
sending email to the editor and placing the document with a status of
'to be reviewed') and save the document.

In this scenario though, wouldn't you actually want the webdav impl at the CMS layer and not built into Xindice itself?

The flow would be.

client <-> webdav <-> CMS <-> XML:DB API <-> CORBA <-> Xindice

With the goal of making it

client <-> webdav <-> CMS <-> XML:DB API <-> SOAP <-> Xindice

or optionally

client <-> webdav <-> CMS <-> SOAP <-> Xindice

Personally, I'd like to see webdav available as a module for Xindice. I'm not sure it needs to be there by default, but maybe it does. I just don't know if it makes sense for the scenario you describe above. Going from the CMS to Xindice via the XML:DB API would be much more efficient then going through webdav.

Now, can I use XIndice to provide the storage system underneath this
CMS?

For example, in order to have a webdav view I need the ability to have
'node flavors': a node can be a 'folder' (currently done with
collections), what is a 'document' and what is a 'document fragment' and
what is a symlink to another document fragment.

It seems you would model most of these at the application level. Do you think the database needs to support more then just collections and documents? If so what and why?

How can I perform access control at the node level without duplicating
the information at the CMS level?

Why do you need node level access control for a CMS? That seems awfully fine grained control and it will be extremely complex to administer and expensive to implement. It's basically like asking to have column level access control for an RDBMS.

how can I perform versioning without
having to duplicate every document entirely?

I think having versioning in the database would be pretty useful for many different applications.

Currently, whenever the CMS saves something on top of another document,
it has to call for the document, perform the diff, get the XUpdate and
send that.

You can replace the whole document if you want via the XML:DB API. Use of XUpdate is completely optional.


I'm not asking to remove XUpdate from the feature list, but to give the
appropriate tools depending on the uses.

Well that is fine, just don't say something is useless when it is only useless to you. :-) It isn't like XUpdate is the only way to change the content in the server.

Yes, you are right saying that XQuery does include this functionality,
but I suggest you to consider the following scenario:

<db:database xmlns:db="xindice#internal" xmlns:cms="CMS">

 <legal db:type="folder">
  <copyright db:type="document" db:version="10.2"
db:last-modified="20010223">
    This is copyright info and blah blah...
  </copyright>
 </legal>

 <press db:type="folder">
  <press-releases db:type="folder">
   <press-release date="20010212" author="blah"
     db:type="document" db:version="10.2" db:last-modified="20010213"
     cms:status="published">
    <title>XIndice 2.0 released!</title>
    <content>
     <p>blah blah blah</p>
     <p><db:link href="/legal/copyright[text()]"/></p>
    </content>
   </press-release>
  </press-releases>
 </press>

</db:database>

then, you can ask for the document

 /press/press-releases/[EMAIL PROTECTED] = '20010212']

and you get

 <press-release>
  <title>XIndice 2.0 released!</title>
  <content>
   <p>blah blah blah</p>
   <p>This is copyright info and blah blah...</p>
  </content>
 </press-release>

which allows your users to avoid probably 200 pages of XQuery syntax to
accomplish the same task (and also, probably, be much faster!).

Is your goal here to have the database be specified in XML or just to have the linking? For the database being specified in XML, that is a bad idea, but I don't think that is what you were really trying to convey. For the linking that actually already exists and has since dbXML 0.2, but we call it experimental because there are a lot of issues with it.

1. It requires db specific tags in the XML documents. For some apps this is OK, for many it is not. 2. If you use XLink to solve problem 1 then you deny the ability of including XLinks that should be passed through to the client. 3. There is a problem between views on the document. Basically you need different views when editing a document vs. retrieving a document. Webdav has/had the same problem with dynamic pages, may be fixed in later spec I' m not sure. 4. Runaway expansion of links (i.e. circular links) could have some very nasty results and could be difficult to detect. 5. Related to above but applicable even in cases where circular links do not exist, linking could bring large portions of the database into memory in cases where that would not be the desired behavior. 6. You have no way to express a relationship that you did not prewire into your data model.

Solutions are possible for most of these things and I'm not sure I agree with Tom that this should be abandoned for XQuery. I see them as being complementary if implemented correctly. For instance you could use linking as a mechanism to optimize XQuery evaluation by prewiring some of the relationships. Likewise XQuery can be used to express relationships that are not known via linking. I like the flexibility of having both, if the linking issues can be resolved acceptably.

Without appropriate hooks for caches, any data storage system is
destined not to scale in real life systems.

I suggest you to place the above two features very high in the todo list
or you'll find people very disappointed when they start getting
scalability problems and you can't give them solutions to avoid
saturation.

No disagreement at all here. I already consider those high priority. It's really a matter of exposing it through the API more then anything else.


Kimbro Staken
XML Database Software, Consulting and Writing
http://www.xmldatabases.org/

Re: XIndice 2.0 [was Re: Data or Documents for Xindice 2.0]

Reply via email to