Re: Remarks about XML:DB API

Kimbro Staken Thu, 7 Feb 2002 23:11:01 +0100 (CET)

On Thursday, February 7, 2002, at 05:38 AM, Arno de Quaasteniet wrote:

Hi,

Inspired by the SixDML proposal I've been looking some more into the
XMLD:DB API specification(since its partially based on the XML:DB core
API spec) and have number of remarks about it, though I did not yet have
time to read the specification thoroughly, so expect some more.
Unfortunatly I also didn't have enough time to think of alternatives the
things I have a problem with.

Some general remarks:
* Resource and Services are perfectly abstract names but its hard to
imagine for a user what they mean. I'm in favor of more specific names,
to make it easier for users to imagine what they stand for (I only have
to figure out what the right names would be).

I'd like to hear some suggestions as this is something we toiled over a fair bit in the beginning. However, I'll also say it hasn't really been a problem. We've had hundreds of people use the API through Xindice and the naming hasn't seemed to cause any confusion. In fact I'm kind of surprised at how easily people picked up on it.

* As Dare Obasanjo already mentioned the tying of services to
collections is not very practical. I think this is definitly something
that should be changed.

Yes, we need some changes here.

Interface specific remarks:

Collection interface

* I think the behavior and interface of the getServices method should be
changed, because:
- Each instance of a service could possibly take up resources, in which
case you would want to instantiate those services lazy whenever
getService is called.
- It's not likely you need them all at once.
- If its meant for checking the types of services supported by the
collection (though personally I do not think that services should be
coupled to collections at all) then it could return only the names of
the services it supports.

We originally had a separate method to check for the existence of a service and it was decided later that it was not really necessary. Your point about the potential for heavy services is a valid one though so you may be right that the mechanism needs to be refined.

* I'm not quite sure about the use of
getResourceCount/getChildCollectionCount, since in the case of X-Hive it
involves counting the resources which of course has a bad performance
characteristic.

Unfortunately the functionality is needed to build usable tools.

CollectionManagementService interface

* If think this interface is overkill, why not add the createCollection
and removeCollection methods to the CollectionInterface? If not should
it then check if the collection it operates on is still open?

Not all databases can use that interface, it's too simplistic for something like Tamino where schemas are required. I added it just to have something that was usable for simple cases, so it's optional.

ResourceSet interface

* getResource(long item) will only have a good performance if there's a
random access list behind the resource set.
* getSize will only have a good performance if there's a list behind the
resource set

Optimize this and that's where you get competitive advantage. :-)

When evaluating queries lazy (not always completely possible: for
instance if the end result, or temporary results need to be sorted), you
typically do not want to gather results in a list, but return them one
by one in using an iterator.

What you typically want to prevent is that users use code like this:

ResourceSet rs = ...;
for (long i = 0; i < rs.getSize(); i++) {
        Resource r = rs.getResource(i);
}

to iterate over the query results when the query is lazy evaluated.
Because this would mean that the result set should first gather al the
query results which would essentially mean that the results are iterated
twice (and you may not have enough working memory to get all the results
from the database).

Again this is an implementation detail. There is no reason that the getSize operation has to be calculated from the contents of the result set. It could easily be provided by the database. Doing that would allow lazy retrieval of results.


Though of course these methods could be useful when there's a list
behind the resource set (for instance when the end result needed to be
sorted) in those cases you can request the size without a performance
penalty.

So maybe some method should be added to see if the resourceset is lazy
or not?

What would be the use case for this?


* getIterator returns a ResourceIterator. I'm more in favor of returning
a java.util.Iterator (I don't see the cast that becomes necessary as a
problem), and renaming the method to iterator() because that's more like
other java interfaces, though I understand that this just a matter of
taste, and having an own interface for it could make porting the API to
other platforms than java easier.

As Tom already pointed out the API is intended to be as language independent as possible. This is a big source of compromises, i.e. things like error codes instead of a collection hierarchy but necessary because we're specifying in IDL. We are a little loose with it though because we use things like DOM and SAX which aren't always precisely defined for other languages. They do exist in other languages though.

* The ResourceIterator interface
If not replaced by java.util.Iterator I would prefer if this interface
would have methods named next() and hasNext() instead of nextResource()
and hasMoreResources().

An finally I have a question, is there a test suite that tests
conformance to the API?

Yes, though neither the API or the test suite is complete. If you download the reference impl there is a test suite as well as a set of base classes that can be used to make driver development easier.

Kind regards,

Arno de Quaasteniet
X-Hive Corporation
+31 (0)10 710 86 24
http://www.x-hive.com
[EMAIL PROTECTED]

----------------------------------------------------------------------
Post a message:         mailto:[EMAIL PROTECTED]
Unsubscribe:            mailto:[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
----------------------------------------------------------------------

Kimbro Staken
XML Database Software, Consulting and Writing
http://www.xmldatabases.org/

----------------------------------------------------------------------
Post a message:         mailto:[EMAIL PROTECTED]
Unsubscribe:            mailto:[EMAIL PROTECTED]
Contact administrator:  mailto:[EMAIL PROTECTED]
Read archived messages: http://archive.xmldb.org/
----------------------------------------------------------------------

Re: Remarks about XML:DB API

Reply via email to