Re: XIndice 2.0 [was Re: Data or Documents for Xindice 2.0]

Stefano Mazzocchi 9 Jan 2002 12:56:20 -0000

Kimbro Staken wrote:
> 
> On Saturday, January 5, 2002, at 04:01 AM, Stefano Mazzocchi wrote:
> > My points was not to remove CORBA from the picture (BTW, is there
> > anybody here who is usign XIndice from CORBA in a real-life
> > application?) but to indicate my impression that time spent on a webDAV
> > connection would have been better spent. No offense intended, just a
> > consideration from the document-oriented world where CORBA will never
> > even enter.
> >
> 
> Everybody who uses the XML:DB API uses CORBA behind the scenes, which
> basically means everybody is using it. I don't know of anyone using the
> CORBA API directly and I wouldn't encourage anyone to do so since we want
> to get rid of CORBA. Now getting rid of CORBA does not mean getting rid of
> that layer. CORBA provides an essential function to the server and that
> function could not be entirely fulfilled by webdav.


Oh, absolutely. never thought the opposite.

> While webdav would be nice for document oriented applications, dbXML was
> not really designed or conceived for those applications nor has the
> majority of the interest in the server been for those types of
> applications. This isn't to deny that both webdav and document oriented
> applications are important, it is to deny that they are the only
> applications that should be targeted. 

Granted. Again, I was not asking for replacement, I was telling you my
wish list.

> I'm all for adding webdav as an
> option, but you're wrong in saying that our time would have been better
> spent there. In fact you are the only person who has ever "really" wanted
> webdav. It had come up in the past but it was never a real solid request
> from any user of the software. Now it is.

Ok, makes sense.

At the same time, I don't think the webdav layer should belong in this
project, but more on the CMS side of things (one day I'll try to have
Slide connect to XIndice directly... you'll hear me if I succeed)
 
> > That's a good point, but again, I'm questioning the darwinistic
> > evolutionary process of this effort: do what people ask, not what
> > architectural elegance suggests or W3C recommends.
> 
> And we've had far more requests for W3C XML Schema then for Relax NG. I'm
> not a fan of XML Schema either, doesn't change the fact that it is what is
> being asked for. I'm with Tom though, if we can do things in a schema
> language independent manner that should be the target.

+1

> >
> > I agree with you on the fact that the engine internals should deal with
> > validation. Just like Cocoon doesn't validate stuff by default.
> >
> 
> Let's not get too caught on just focusing on validation here. Validation
> is just part of the schema equation. There's also the data-typing issue to
> consider. This will be particularly important with XQuery. In fact I'd say
> data typing is even more important then validation for data oriented apps,
>   but you can't really apply types without the structure of the document
> being known. This means some level of schema support has to be built into
> the server.
> 
> Just to be clear, in no way am I suggesting that the server should
> "require" a schema. In fact I'd consider requiring schemas to be
> destroying what I value most about the server.

agreed.

> I agree it would be cool if validation could be done at either client or
> server under the control of the developer. For data oriented apps having
> robust schema support on the server will be essential though.

Oh, even for document systems, but probably on another level. databases
should concentrate on data.

> > The content management system I'd like to have could be build in two
> > ways:
> >
> >  1) single layer: XIndice includes all the required functionality.
> >
> >  2) double layer: XIndice is the db engine, something else wraps it and
> > performs CMS operations like access control, workflow management, data
> > validation, versioning, etc.
> 
> > Separation of Concerns clearly indicates that the second option is the
> > best. This has been my view of the issues since May 2000, when I first
> > took a serious look to dbXML as the engine for such a system.
> >
> 
> Yes number 2 is clearly the way to go.

agreed.

> > This is why I wanted XIndice over to Apache: separation of concerns is a
> > great way to do parallel design and increase productivity and give users
> > more choice, but it can't work without *solid* contracts between the
> > systems that interoperate.
> >
> > So, what I'm asking, is *NOT* to turn XIndice into a CMS, not at all!
> 
> Good, because I certainly wouldn't agree with that.
> 
> > What I'd like to see is XIndice remaining *very* abstract on the XML
> > model, but without sacrificing performance and making it possible to
> > implement more complex systems on top.
> >
> 
> Absolutely, that's the whole point. Xindice is about flexibility.
> 
> >
> > Absolutely. Still, please, let's try to avoid a pissing contest with the
> > RDBMS communities and lead the way for those grounds where the relation
> > model fits, but with a very bad twist.
> >
> 
> I agree, I don't want to get into this battle either. However, that doesn'
> t mean that an XML database is not useful in data oriented applications.

all applications using a database require a data-oriented engine. the
entire question is about the 'type' of structure this data has.

> The simple fact that you have semi-structured data is incredibly valuable
> for many applications that are nothing like a CMS. They're still data
> oriented applications though. Just by building a database it doesn't
> automatically mean that you have to suddenly start chanting "death to
> RDBMS".

Absolutely agreed.
 
> >>>
> >>>  - web services
> >>>  - content management systems
> >>
> >> Don't forget health care, legal documents, and scientific applications.
> >
> > These are all examples of the above two.
> >
> 
> Heh, heh, there is no way that I'll buy into the idea that the only two
> places where Xindice is useful are web services and CMS. There's more to
> XML data management then that.

For example? (just curious, not ironically challenging)
 
> > XUpdate is a way express deltas, differences between trees.
> >
> > In the data-centric world, people are used to send deltas: change this
> > number with this other one, append this new address, remove this credit
> > card from the valid list.
> 
> > In the document-centric world, people are used to think of files, not
> > about their diffs.
> 
> > CVS is a great system because does all the differential processing on
> > documents by itself, transparently.
> >
> > Now, the use of a delta-oriented update language isn't necessarely bad
> > as a 'wire-transport' (much like CVS sends compressed diff between the
> > client and the server) but definately isn't useful by itself without
> > some application level adaptation.
> >
> > Now, let me give you a scenario I'd like to see happening: imagine to
> > have this CMS system implemented and you provide a WebDAV view of your
> > database.
> >
> > You connect to this 'web folder' (both Windows, Linux and MacOSX come
> > with the ability to mount webdav hosts as they were file system
> > folders), you browse it and you save your file from your favorite XML
> > editor (or even using stuff like Adobe Illustrator for SVG).
> >
> > The CMS will control your accessibility (after authentication or using
> > client side certification, whatever), perform the necessary steps
> > defined on that folder by the workflow configurations (for example,
> > sending email to the editor and placing the document with a status of
> > 'to be reviewed') and save the document.
> >
> 
> In this scenario though, wouldn't you actually want the webdav impl at the
> CMS layer and not built into Xindice itself?
> 
> The flow would be.
> 
> client <-> webdav <-> CMS <-> XML:DB API <-> CORBA <-> Xindice
> 
> With the goal of making it
> 
> client <-> webdav <-> CMS <-> XML:DB API <-> SOAP <-> Xindice
> 
> or optionally
> 
> client <-> webdav <-> CMS <-> SOAP <-> Xindice
> 
> Personally, I'd like to see webdav available as a module for Xindice. I'm
> not sure it needs to be there by default, but maybe it does. I just don't
> know if it makes sense for the scenario you describe above. Going from the
> CMS to Xindice via the XML:DB API would be much more efficient then going
> through webdav.

I agree with you, the webdav layer should be on top of something else,
probably, SOAP.
 
> > Now, can I use XIndice to provide the storage system underneath this
> > CMS?
> >
> > For example, in order to have a webdav view I need the ability to have
> > 'node flavors': a node can be a 'folder' (currently done with
> > collections), what is a 'document' and what is a 'document fragment' and
> > what is a symlink to another document fragment.
> >
> 
> It seems you would model most of these at the application level. Do you
> think the database needs to support more then just collections and
> documents? If so what and why?

I'll trigger another email for this.
 
> > How can I perform access control at the node level without duplicating
> > the information at the CMS level?
> 
> Why do you need node level access control for a CMS? That seems awfully
> fine grained control and it will be extremely complex to administer and
> expensive to implement. It's basically like asking to have column level
> access control for an RDBMS.

I'm not saying that you have to fine tune your ACL for *every* node, but
I'm saying that if you consider your nodes are the 'data atoms' you need
to have access control at that level (think of file systems!).

How to make this usable it's an implementation detail (all nested nodes
inherit the parent ACL and so on...)
 
> > how can I perform versioning without
> > having to duplicate every document entirely?
> 
> I think having versioning in the database would be pretty useful for many
> different applications.

Absolutely.
 
> > Currently, whenever the CMS saves something on top of another document,
> > it has to call for the document, perform the diff, get the XUpdate and
> > send that.
> 
> You can replace the whole document if you want via the XML:DB API. Use of
> XUpdate is completely optional.

Ok, than I'm fine.
 
> >
> > I'm not asking to remove XUpdate from the feature list, but to give the
> > appropriate tools depending on the uses.
> >
> 
> Well that is fine, just don't say something is useless when it is only
> useless to you. :-) It isn't like XUpdate is the only way to change the
> content in the server.

Ok, good point, sorry for that. :)
 
> > Yes, you are right saying that XQuery does include this functionality,
> > but I suggest you to consider the following scenario:
> >
> > <db:database xmlns:db="xindice#internal" xmlns:cms="CMS">
> >
> >  <legal db:type="folder">
> >   <copyright db:type="document" db:version="10.2"
> > db:last-modified="20010223">
> >     This is copyright info and blah blah...
> >   </copyright>
> >  </legal>
> >
> >  <press db:type="folder">
> >   <press-releases db:type="folder">
> >    <press-release date="20010212" author="blah"
> >      db:type="document" db:version="10.2" db:last-modified="20010213"
> >      cms:status="published">
> >     <title>XIndice 2.0 released!</title>
> >     <content>
> >      <p>blah blah blah</p>
> >      <p><db:link href="/legal/copyright[text()]"/></p>
> >     </content>
> >    </press-release>
> >   </press-releases>
> >  </press>
> >
> > </db:database>
> >
> > then, you can ask for the document
> >
> >  /press/press-releases/[EMAIL PROTECTED] = '20010212']
> >
> > and you get
> >
> >  <press-release>
> >   <title>XIndice 2.0 released!</title>
> >   <content>
> >    <p>blah blah blah</p>
> >    <p>This is copyright info and blah blah...</p>
> >   </content>
> >  </press-release>
> >
> > which allows your users to avoid probably 200 pages of XQuery syntax to
> > accomplish the same task (and also, probably, be much faster!).
> >
> 
> Is your goal here to have the database be specified in XML or just to have
> the linking? 

No, I was showing an XML 'view' of the internal database data, of
course, I'm not proposing to use *this* as the actual data stored. I'm
not that foolish :)

> For the database being specified in XML, that is a bad idea,
> but I don't think that is what you were really trying to convey.  

exactly.

> For the
> linking that actually already exists and has since dbXML 0.2, but we call
> it experimental because there are a lot of issues with it.
> 
> 1. It requires db specific tags in the XML documents. For some apps this
> is OK, for many it is not.

ok

> 2. If you use XLink to solve problem 1 then you deny the ability of
> including XLinks that should be passed through to the client.

you can use xlink 'roles' to identify its internal behavior!

> 3. There is a problem between views on the document. Basically you need
> different views when editing a document vs. retrieving a document. Webdav
> has/had the same problem with dynamic pages, may be fixed in later spec I'
> m not sure.

good point.

> 4. Runaway expansion of links (i.e. circular links) could have some very
> nasty results and could be difficult to detect.

nah, no difficult at all, just mark the nodes you have visited.

> 5. Related to above but applicable even in cases where circular links do
> not exist, linking could bring large portions of the database into memory
> in cases where that would not be the desired behavior.

I expect that if you link something you do it because you use the same
content in many places. This will actually increase performance by
placing used parts in memory.

The alternative is joining documents with XQuery and I do not thing this
is going to be any faster than this.

> 6. You have no way to express a relationship that you did not prewire into
> your data model.

well, that's an intrinsic limitation, but something we all can live with
if when we choose to use it. I'm not proposing to "remove" XQuery for
internal linking, no way, but this alone would remove 80% of XQuery
usages in the document-centric world and this is a worthy goal, IMO.
 
> Solutions are possible for most of these things and I'm not sure I agree
> with Tom that this should be abandoned for XQuery. 

Yep, that's what I think as well.

> I see them as being complementary if implemented correctly. 

Absolutely.

> For instance you could use linking
> as a mechanism to optimize XQuery evaluation by prewiring some of the
> relationships. 

Yep.

> Likewise XQuery can be used to express relationships that
> are not known via linking. I like the flexibility of having both, if the
> linking issues can be resolved acceptably.

Absolutely +1!!

> > Without appropriate hooks for caches, any data storage system is
> > destined not to scale in real life systems.
> >
> > I suggest you to place the above two features very high in the todo list
> > or you'll find people very disappointed when they start getting
> > scalability problems and you can't give them solutions to avoid
> > saturation.
> >
> 
> No disagreement at all here. I already consider those high priority. It's
> really a matter of exposing it through the API more then anything else.

Ok, cool.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<[EMAIL PROTECTED]>                             Friedrich Nietzsche
--------------------------------------------------------------------

Re: XIndice 2.0 [was Re: Data or Documents for Xindice 2.0]

Reply via email to