Re: Content Rewriter Modularization: Design/Change

John Hjelmstad Thu, 07 Aug 2008 13:30:09 -0700

On Thu, Aug 7, 2008 at 3:20 AM, Ben Laurie <[EMAIL PROTECTED]> wrote:

> On Wed, Aug 6, 2008 at 11:34 PM, John Hjelmstad <[EMAIL PROTECTED]> wrote:
> > This proposal effectively enables the renderer to become a multi-pass
> > compiler for gadget content (essentially, arbitrary web content). Such a
> > compiler can provide several benefits: static optimization of gadget
> content
> > (auto-proxying of images, whitespace/comment removal, consolidation of
> CSS
> > blocks), security benefits (caja et al), new functionality (annotation of
> > content for stats, document analysis, container-specific features), etc.
> To
> > my knowledge no such infrastructure exists today (with the possible
> > exception of Caja itself, which I'd like to dovetail with this work).
>
> Caja clearly provides a large chunk of the code you'd need for this.
> I'd like to hear how we'd manage to avoid duplication between the two
> projects.
>
> A generalised framework for manipulating content sounds like a great
> idea, but probably should not live in either of the two projects (Caja
> and Shindig) but rather should be shared by both of them, I suspect.

I agree on both counts. As I mentioned, the piece of this idea that I expect
to change the most is the parse tree, and Caja's .parser.html and
.parser.css packages contain much of what I've thrown in here as a base.

My key requirements are:
* Lightweight framework.
* Parser modularity, mostly for HTML parsers (to re-use the good work done
by WebKit or Gecko.. CSS/JS can come direct from Caja I'd bet)
* Automatic maintenance of DOM<->String conversion.
* Easy to manipulate structure.

I'd love to see both projects share the same base syntax tree
representations. I considered .parser.html(.DomTree) and .parser.css for
these, but at the moment these appeared to be a little more tied to Caja's
lexer/parser implementation than I preferred (though I admit
AbstractParseTreeNode contains most of what's needed).

To be sure, I don't see this as an end-all-be-all transformation system in
any way. I'd just like to put *something* reasonable in place that we can
play with, provide some benefit, and enhance into a truly sophisticated
vision of document rewriting.

>
>
> >  c. Add Gadget.getParsedContent().
> >    i. Returns a mutable GadgetContentParseTree used to manipulate Gadget
> > Contents.
> >    ii. Mutable tree calls back to the Gadget object indicating when any
> > change is made, and emits an error if setContent() has been called in the
> > interim.
>
> In Caja we have been moving towards immutable trees...

Interested to hear more about this. The whole idea is for the gadget's tree
representation to be modifiable. Doing that with immutable trees to me
suggests that a rewriter would have to create a completely new tree and set
it as a representation of new content. That's convenient as far as the
Gadget's maintenance of String<->Tree representations is concerned... but
seems pretty heavyweight for many types of edits: in-situ modifications of
text, content reordering, etc. That's particularly so in a single-threaded
(viz rewriting) environment.

--John

Re: Content Rewriter Modularization: Design/Change

Reply via email to