Scott, Joseph & all,
Scott:
> I am worried most about the way we did our "expanded type IDs", which keeps
> three integer values to represent the type, name, and namespace. You use a
> single value to represent the concatenated name and namespace, which we
> didn't like, but our scheme may have problems for what you need to do also.
> So I think we'll have to play around with this until we're both happy.
XSLTC owes its performance to the way a node's type is represented by
a single integer that maps to an expanded QName. These integers are
in sequence from 0 and up to as many is needed to represent the
stylesheet and DOM. This alows us to use a Java TABLESWITH instruction
to fire off the various templates based on the context node's type.
I don't think this is something that could be comprimised, but I do
think we could still be obtain the same if we added an extra mapping
between your DTM type triplet and the current XSLTC DOM linear types.
Scott:
> > I suggest one general AST package that contains all the AST stuff up
> > to and including the parsing level, and then keeping the compiler and
> > interpreter stuff separate:
>
> Yep. My question is if compiler.ForEach and process.ForEach should derive
> from ast.Instructruction or should use a visitor pattern. I would think a
> visiter pattern is best. I don't know how much structural rewritting you
> are doing now, but I believe that structural rewrites are very important
> for optimization. I would see the rewrites as working by multiple
> optimization iterations over the AST, rewriting it probably several times
> (redundent expression elimination, dead code elimination, tail merging,
> inline expansion, etc.), until it can be optimized no more, and then using
> the final version to produce the compiled form. I think this is how an
> optimizing compiler works, though I am no expert. Perhaps the derivation
> vs. tree walking really doesn't make a difference for this, ...not sure.
We have not done much work on optimising the AST, as compile-time
performance so far has not been of significant importance to XSLTC.
There is not reason to use a Visitor on the AST for the stuff that
is currently done for XSLTC. Each node in the XSLTC AST fits easily
into a general object structure:
parseContents()
typeCheck()
translate()
The parseContents() method uses the XML and XPath parsers to break
up child elements and XPath patterns/expressions. The typeCheck()
method is obvious. The translate() method generates Java bytecodes.
I can imagine that this is not the case for the Xalan interpreter,
and that you need loads of different methods/funtionality for the
various nodes in the AST. If this is the case then the advantages
of using a Visitor class are clear.
Joseph:
> > Have you any intention of creating more DTD builders, such as your
> > existing DOM2DTM and SAX2DTM - for example a JDBC2DTM or an LDAP2DTM.
> > Or would you use an external 'box' to generate SAX events/a DOM from
> > JDBC/LDAP?
>
> Either approach is viable. Exporting as SAX and running that into SAX2DTM
> would certainly be the easiest thing to code, but might not be most
> efficient (which is why we also did DOM2DTM).
True.
Joseph:
> Note that these should probably be considered implementations of the API
> rather than just builders -- both because their internal representation
> varies significantly (though they share a great deal of code from their
> common base class) and because we may wind up doing other versions which
> have the same API but different internals. For example, there's been some
> interest in reviving the old ultra-compact version of DTM and making it
> available under the new APIs; that would be slower but should allow us to
> handle larger documents in a given amount of memory.
Large input documents is an absolute disaster for XSLTC's current DOM. The
DOM builder class is grand with smaller input files, but the build-time
seems to grow exponentially with the size of the input. I am pretty sure
this is due to resizing of some integer arrays. We have cheekily excused
this by saying that XSLTC is for a specific niche only - small transformations
using static stylesheet.
Joseph:
> >I suppose we only want to share the functionality that builds the AST
> >and that parses any patterns/expressions an element may have.
>
> And any stylesheet optimizations which can be performed at that level, of
> course. One of the things I'm Really Hoping is that we can share as much
> cleverness as possible between the two paths to execution.
Indeed - but our goals here may not intersect. XSLTC uses the AST to
generate code, and the speed of the generated code is far more
important than the code within the AST. But, there is currently no
mechanism to traverse the AST in an attempt to remove redundancies or
'dead' parts of the stylesheet. 'dead' stylesheet fragments usually
result in 'dead' bytecodes in the translet, ie. code that is never
run and that does not really slow the transformation down. But we
could probably avoid a good few test-and-branch instructions...
Scott:
> > It is just an initial thought, and I agree
> > with you that we should focus on the DOM/DTD integration first.
>
> Actually I would like to jump on the ast integration pretty quickly. In
> our code we have some less-than-beautiful stuff (namely XPath opmaps) that
> I need to send on it's way. This is important for us because our existing
> structures have become blockers in terms of what we want to do with the
> code. Having a solid AST structure enables us to more forward more quickly
> with other initiatives that we want to do. We will probably want to use
> the AST structure we develop for the C++ version of Xalan also.
Grand. The AST work seems a lot more challenging as well. I've been working
on our DOM for too long now and I am sick of it! Using the AST in a C++
version of Xalan sounds brilliant! I'd love to do XSLTC for C++, so maybe
this is a start for that too...
Morten