Re: Planning the future

Stefano Mazzocchi 20 Feb 2002 11:40:48 -0000

Kimbro Staken wrote:

> > Project management
> > ------------------
> >
> >> Need to determine release roadmap
> >> Should probably split on two paths 1.0 and 2.0 series
> >
> > Let me give you my personal suggestion: whenever possible, continue
> > incrementally.
> >
> > Rewriting software from scratch normally causes *lots* of pain and is
> > much harder to manage since forking friction might develop.
> >
> > Cocoon 2.0 took two full years to happen, with more than 5 different
> > developers helping out directly. This project has two (yes, I see this
> > growing, but only two people really know the code today and this is a
> > fact that must not be ignored).
> >
> > My suggestion: use the 1.x series to move to 2.0 slowly and one feature
> > at a time, instead of doing a major rewrite for 2.0.
> >
> > For Cocoon, we didn't do this because it was impossible: Cocoon 1.x and
> > Cocoon 2.x are so different that they don't share a single line of code.
> > I strongly doubt that the planned XIndice 2.0 will not be able to share
> > anything from XIndice 1.x, so my warm suggestion is to keep going with
> > the 1.x series, attacking one thing at a time.
> >
> 
> You know what, you're right. If I ask myself how I would approach this
> without any external pressures I would definitely pursue a more
> incremental approach. The 2.0 idea is really a hold over from the planning
> of the now dead commercial product. I'll be quite happy to expunge it from
> my brain.


Very well. I can guarantee you that you will find this a valuable
decision in 6 months when you have a 1.3 version that does half of what
you planned for 2.0 but does it with a much more stable way and with a
bigger development community.

> >
> >
> >> Schema validation in core
> >
> > -0: I'm with Tom here: I don't think this should reside in the database
> > core, but if others find it valuable enough to contribute it and if this
> > doesn't cause internal performance degradation (both speed and memory),
> > I'd say we accept it.
> >
> 
> Except my understanding is that Tom is saying this does belong in the core.
>   Tom can clarify of course.

Ops, sorry. Tom clarified.

Then I disagree with Tom, but I'm -0 as long as you leave the validation
configurable (mean that if I want to go faster, I disable it).
 
> > Just keep in mind that without namespace-aware validation, we are going
> > nowhere, so I'd be -1 on DTD validation in the core.
> >
> 
> Yeah, I don't really want DTDs either. Overall they just don't work well
> in a database context.

absolutely.
 
> >> Focus on roundtripping?
> >
> > What do you exactly mean by 'roundtripping'?
> >
> 
> The document that goes in is the same that comes out.  For some apps it's
> important, for most it doesn't really matter. The question is how close do
> you strive to get. For instance one difference between Xindice and eXist
> is that Xindice preserves CDATA sections and eXist does not. Both are
> valid behavior as far as XML is concerned, but make a big difference to
> applications. It's difficult to round trip XML at the syntax level unless
> it's already Canonical XML.

yes, I know what you mean but the problem is that neither DOM nor SAX
are roundtripping safe.

One for all:

 <blah 1="..." 2="..."/>

and 

<
blah 
2="..." 
1="...."
/>

end up with the exact same SAX events or DOM nodes. There's nothing you
can do about it.
 
> > Canonical XML?

FYI, xml.apache.org/security/ implements a java canonicalizer. Could be
used as a configuration to store data safely canonicalizing it as it
gets it.
 
> >> We should probably look at including some connectivity to other data
> >> sources
> >> Maybe a MySQL backend to start.
> >
> > Hmmm, I'm curious here: why?
> >
> 
> Because people ask for it quite often. I don't think it should be any kind
> of focus, but if someone contributes it we'll happily accept it. It'd just
> be nice to have as a goal to encourage a contribution, I personally have
> no interest in working on it.

Me neither. Expecially since an XPath with more than 7 nested paths will
end up being a JOIN nightmare and slow as hell. XML 2 relational
mappings are *so* horrible to deal with and the generated SQL is a
*serious* pain in the ass for the RDBMS to compile/optimize (in fact, I
bet Oracle XML layer is slow as hell!)
 
> >> Linking
> >> To keep and mature or to eliminate?
> >
> > big +1 for keeping and maturing the concept!
> 
> I'm +0 on this right now. There are a lot of issues with it in practice
> even if I agree it would be a very useful thing if it worked right.

Ok, let's sort it out then. I really like this feature and I don't want
to see it going away (expecially in document-centric use, this is *very*
useful)

Question: does the DB:XML API include the notion of *views*? in that
case, we could ask for the 'skeleton' view where the internal namespaces
will be returned untouched, or the 'normal' view where the auto-links
are expanded.

> >
> >> Expanded in database meta-data
> >> Need to see exactly what we do and do not have here.
> >
> > metadata is vital for a decently fast use of XIndice. I would say this
> > is #1 priority in future development since Forrest might not be possible
> > for XIndice without the ability to cache resources and without exposing
> > metadata.
> >
> 
> Yeah, it's a high priority. Having real apps needing it will provide good
> motivation.

Forrest will for sure provide this.

> >> Command Line Tools
> >> Simpler interface
> >> Maybe replace with interactive interpreter and SixDML implementation
> >
> > A simple yet powerful command line interface is a must, expecially since
> > databases are normally managed over the wire, via ssh.
> >
> 
> I wasn't suggesting removing the tools, just simplifying them based around
> an interactive interpreter like the mysql command. The current tools are
> cumbersome and SixDML provides a much more friendly language for managing
> the database.

Yes, I just took a brief look and it seems much better than XUpdate.

> >
> 
> >
> >> Graphical Tools
> >> Bring in browser projects as part of the core?
> >
> > Hmmm, I don't know: I would say so at first, then see if it's ok to
> > spawn another internal subproject (like Velocity or Avalon do on
> > Jakarta) when the release cycles of the two efforts start to get out of
> > synch.
> >
> >> Graphical admin tool?
> >
> > Might be useful only if capable of going over the wire in a crypted
> > fashion (or, at least, exchange passwords using digest challenging and
> > without passing the password over the wire.
> >
> 
> It's definitely useful, there's 4 or 5 in various states around already.
> My opinion on this is to just see what comes organically. At least in the
> short term I don't think we need a concerted effort for it.

I'd suggest to link the all the GUI efforts from the docs and then see
the best one in 6 months and move it in at that point.
 
> >
> >
> >> Would allow easy retrieval of bits of XML documents
> >
> > which is a must for an XML database (otherwise, what's the point of
> > abandoning the relational model?)
> 
> You can already do this in the current collection oriented model.

yes, but one big document might appear simpler for people coming from
the 'XSLT' world where they perform XPaths over documents, so one big
tree of nodes.
 
> >
> >> The database would appear logically as one big XML document, while
> >> physically being a different structure.
> >
> > yes yes yes, this is, IMO, the ideal solution, even because it could
> > allow the creation of an easy database 'dump' as one big XML file (which
> > is going to be *extremely* useful for many situations like moving your
> > data to back-incompatible versions of the database)
> >
> >> Could enable either the collection centric or document centric view to
> >> be used.
> >
> > Yes, in fact, the 'collection' idea can be used at different levels: a
> > book is a collection of chapters, a chapter a collection of sections, a
> > section a collection of blocks, etc...
> >
> > I would love to have a database that might logically appear as one big
> > persistent document, as a XML-oriented file system (with folders, files
> > and access control) or with other logical views, because that would
> > allow easy access to the system from the different realms of data
> > inserting and data query.
> >
> > Admittedly, this is a document-centric view, but since this is very
> > likely to be one of the areas where XML databases make more sense, I
> > think it's very important to consider these things right from the design
> > phase.
> 
> I tend to agree and believe there is merit from both the document and data
> perspectives.

Absolutely.

> >
> >> Access control
> >> Should be considered within the context of where we're going to go with
> >> the server framework
> >
> > probably
> >
> >> For embedded apps should probably be possible to get it completely out
> >> of the way.
> >
> > Hmmm, I dare to disagree here:
> >
> 
> My point was that it should be possible, that doesn't mean it's the only
> way. Many and I'd dare say most embedded apps will not want database level
> security slowing things down. They'll be more likely to implement their
> own application level concept of security. It all depends on the app. I'm
> looking at several embedded apps right now and none of them need any kind
> of database level security. For client server apps security is a obvious
> need, for embedded apps it's less of a requirement.

Very good point. I'll trust your judgement on this.

> > One of my feature dreams of a native XML database is the ability to turn
> > parts of the tree transparent during my query: for example, suppose you
> > have something like
> >
> >  <data db:owner="xindice-dev">
> >   <datum db:owner="stefano" value="10"/>
> >   <datum db:owner="kimbro" value="20"/>
> >   <datum value="30"/>
> >  </data>
> >
> > and suppose you have the information that there is a dependency of
> > owners, since both 'stefano' and 'kimbro' both belong to 'xindice-dev'.
> >
> > Now, suppose you make a Connection to this database, submitting a 'role'
> > and authenticating with a password (or any other authentication
> > mechanism). Now the database knows *who* is making the query. So,
> > performing the same query, but with a different username, might return
> > different results, totally transparent.
> >
> >  xpath query: /data/datum
> >
> > will return:
> >
> >  1) xindice-dev -> <datum value="30"/>
> >  2) stefano -> <datum value="10"/> <datum value="30"/>
> >  3) kimbro -> <datum value="20"/> <datum value="30"/>
> >  4) root -> <datum value="10"/> <datum value="20"/> <datum value="30"/>
> >
> > I see this as an *incredibly* powerful way to add different 'views' of
> > the same data, depending on 'who' makes the query ('who' might not be
> > which software, but which user is requesting data thru that sofware)
> >
> >> Authorization
> >> Encryption?
> >> With HTTP based protocols and integrated SSL network level encryption
> >> should be > relatively simple. Depends on where we go for server
> >> framework
> >
> > I don't think XIndice should mess with any crypto stuff at this level.
> 
> Sure it should. It's very common to expect encryption between a client and
> a server, it also makes it more comfortable to talk to a Xindice instance
> over the network. I have a lot of uses where I'd like to access a remote
> Xindice server directly from my desktop, having the transaction be clear
> text is a very bad thing. At this point SSL should be relatively simple
> and ideally should be picked up from our server framework, hopefully for
> free.

That's what I meant: crypto should not built-in but got for free out of
SSL tunnels.

> >
> >> Query Facility
> >> Add SixDML
> >
> > what is this?
> 
> It's a SQL like native XML database query language. It includes a nice set
> of DML and DDL constructs. It's going to become an XML:DB Initiative
> project and is pretty much a logical evolution of the XML:DB API and
> XUpdate. http://www.sixdml.org/
> 
> Basically instead of typing things like
> xindice ad -c /db/collection -f file.xml
> 
> You'd run the command interpreter and type
> insert URL file.xml into collection /db/collection
> 
> It adds all the XML database specific stuff that XQuery isn't going
> anywhere near.

way cool.

> >
> >> What exactly do we need from this?
> >> What are the benefits and the costs?
> >
> > Expect high costs: Avalon is not only a server framework, but is also a
> > way of componentizing your application. Porting an application under
> > Avalon, normally is a hard operation, but pays off *a lot* later on
> > since refactored code is simpler cleaner and better interoperable with
> > rest of the avalon realms.
> >
> >> Use Tomcat for the runtime environment eliminating all the old
> >> Juggernaut code?
> >
> > what would you need tomcat for?
> >
> 
> Runtime environment, maybe Avalon provides this I'm not sure.

there is a very small HTTP protocol handler in Avalon, yes, but you can
also mount Tomcat inside Avalon and use it as the HTTP protocol handler
(even if I wouldn't recommend it given the complexity of the code, just
for a simple HTTP handler... this unless you need servlet access
directly)

> 
> > +1
> >
> > I also like to add:
> >
> > *) versioning at the core level
> > *) full XML export/import of the database as a single big file (full
> > means that it should be possible to reimport the entire database in one
> > shot) [this will also help to see the logical structure of the data
> > stored in the database]
> >
> 
> You can already do this to a directory structure. I don't know if a single
> file is that good of an idea, but it would be relatively easy to implement
> from the current export code.

ok, I'll look into this.

> > *) node-granular metadata and access control
> >
> 
> Node level's going to be really tough. The runtime costs will be huge.
> I'll add it to the list though.

Yes, runtime costs could be high.... in that case, that could be
implemented at the application level.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<[EMAIL PROTECTED]>                             Friedrich Nietzsche
--------------------------------------------------------------------

Re: Planning the future

Reply via email to