Joe,

The easiest way to set up the test bed is to configure InstantDB and use the
test data that we still include in the samples directory.  From there just run

the test xsl file in the basic-connection directory. Besides setting up the DB

it should be straight forward. Setting up the DB is only an issue with
classpaths
and the correct directory structure for the sample data.

When you say "fragmentary-DTM implementation" are you saying that the
implementation is lacking, or is streaming introducing some behind the scene
problems.

The term streaming in the SQL module probably differs from the standard
definition.
The SQL module is always streaming in the sense that the module  returns data
as
soon as it gets it, it does not pull all the records from the DB before it
returns the first
record. I.e. as the walker traverses the tree, the ReusltSet.next() is called.
As you walk
through the document, a complete document will be built up in memory.

When you turn on Streaming mode,  the SQL extension will open a window which
only contains one record at a time. As you traverse, previous elements are
discarded
(sort of, reused really), and only the current record is loaded in memory.
This allows
you to do the million row query and be very deterministic with memory
consumption.
The downside to SQL Streaming is that you can only perform forward only
traversing.
If you need to sort, or traverse more than once Streaming mode must be turned
off..

Another optimization is that the column attributes for each column are the
same
from row to row, including then values in the metadata element. What I do is
reuse the same node ID for each column throughout the document. Which saves
10 node ID's per column per row. A problem you may run into though is that
the attributes on any row-set/row/col will have a NodeID that is earlier in
the
document (maybe on a different DTM segment) than the column Node ID itself.

If there is something I can do to make the implementation more complete please

let me know.

Regards
John G

Joseph Kesselman wrote:

> I'm afraid I didn't follow through on this one; distracted by other
> issues. This feels like one of those "easier to grok if I can watch it
> fail" kinds of questions.... Any suggestions on the easiest way of setting
> up a minimal test-bed for it, given that I haven't tried to run this code
> before and (AFAIK) don't have anything resembling a database currently
> installed?
>
> Note that there's another issue lurking in the wings... As we move toward
> optimizing stylesheets, we've introduced common subexpression elimination.
> Depending on what those expressions are doing, I suspect this too may
> introduce some additional pre-scanning of documents, and may disrupt the
> assumptions behind the SQL extension's fragmentary-DTM implementation. The
> right answer may wind up being for the SQL extension to build a "real" DTM
> despite the loss of streaming efficiency, and for us to get back to work
> on the DTM Pruning code to improve streaming across the board.
>
> ______________________________________
> Joe Kesselman  / IBM Research

--
--------------------------------------
John Gentilin
Eye Catching Solutions Inc.
18314 Carlwyn Drive
Castro Valley CA 94546

    Contact Info
[EMAIL PROTECTED]
Ca Office 1-510-881-4821
NJ Office 1-732-422-4917



Reply via email to