Re: Content Rewriter Modularization: Design/Change

Kevin Brown Mon, 11 Aug 2008 13:02:55 -0700

I agree with Louis -- that's just not practical. Every rewriting operation
must work in real time. Caja's existing html parser is adequate for our
needs, and we shouldn't go out of our way to tolerate every oddity of random
web browsers (especially as it simply wouldn't work unless you farmed it out
to *every* browser). Any new code needs to be grounded in practical, current
needs, not theoretical options. We can always change code later if we find a
real need for something like that. We have real work to do in the meantime.


On Mon, Aug 11, 2008 at 12:06 PM, Louis Ryan <[EMAIL PROTECTED]> wrote:

> John,
>
> From a practicality standpoint I'm a little nervous about this plan to make
> RPCs calls out of a Java process to a native process to fetch a parse tree
> for transformations that have to occur realtime. I don't think the
> motivating factor here is to accept all inputs that browsers can. Gadget
> developers will tailor their markup to the platform as they have done
> already. I would greatly prefer us to pick one 'good' parser and stick with
> it for all the manageability and consumability benefits that come with that
> decision. Perhaps Im missing something here?
>
> -Louis
>
> On Mon, Aug 11, 2008 at 11:59 AM, John Hjelmstad <[EMAIL PROTECTED]> wrote:
>
> > On Fri, Aug 8, 2008 at 6:10 AM, Ben Laurie <[EMAIL PROTECTED]> wrote:
> >
> > > [+google-caja-discuss]
> > >
> > > On Thu, Aug 7, 2008 at 9:27 PM, John Hjelmstad <[EMAIL PROTECTED]>
> wrote:
> > > > On Thu, Aug 7, 2008 at 3:20 AM, Ben Laurie <[EMAIL PROTECTED]> wrote:
> > > >
> > > >> On Wed, Aug 6, 2008 at 11:34 PM, John Hjelmstad <[EMAIL PROTECTED]>
> > > wrote:
> > > >> > This proposal effectively enables the renderer to become a
> > multi-pass
> > > >> > compiler for gadget content (essentially, arbitrary web content).
> > Such
> > > a
> > > >> > compiler can provide several benefits: static optimization of
> gadget
> > > >> content
> > > >> > (auto-proxying of images, whitespace/comment removal,
> consolidation
> > of
> > > >> CSS
> > > >> > blocks), security benefits (caja et al), new functionality
> > (annotation
> > > of
> > > >> > content for stats, document analysis, container-specific
> features),
> > > etc.
> > > >> To
> > > >> > my knowledge no such infrastructure exists today (with the
> possible
> > > >> > exception of Caja itself, which I'd like to dovetail with this
> > work).
> > > >>
> > > >> Caja clearly provides a large chunk of the code you'd need for this.
> > > >> I'd like to hear how we'd manage to avoid duplication between the
> two
> > > >> projects.
> > > >>
> > > >> A generalised framework for manipulating content sounds like a great
> > > >> idea, but probably should not live in either of the two projects
> (Caja
> > > >> and Shindig) but rather should be shared by both of them, I suspect.
> > > >
> > > >
> > > > I agree on both counts. As I mentioned, the piece of this idea that I
> > > expect
> > > > to change the most is the parse tree, and Caja's .parser.html and
> > > > .parser.css packages contain much of what I've thrown in here as a
> > base.
> > > >
> > > > My key requirements are:
> > > > * Lightweight framework.
> > > > * Parser modularity, mostly for HTML parsers (to re-use the good work
> > > done
> > > > by WebKit or Gecko.. CSS/JS can come direct from Caja I'd bet)
> > > > * Automatic maintenance of DOM<->String conversion.
> > > > * Easy to manipulate structure.
> > >
> > > I'm not sure what the value of parser modularity is? If the resulting
> > > tree is different, then that's a problem for people processing the
> > > tree. And if it is not, then why do we care?
> >
> >
> > IMO the value of parser modularity is that the lenient parsers native to
> > browsers can be used in place of those that might not accept all inputs.
> > One
> > could (and I'd like to) adapt WebKit or Gecko's parsing code into a
> server
> > that runs parallel to Shindig and provides a "local RPC" service for
> > parsing
> > semi-structured HTML. The resulting tree for WebKit's parser might be
> > different than that for an XHTML parser, Gecko's parser, etc, but if the
> > algorithm implemented atop it is rule-based rather than strict-structure
> > based that should be fine, no?
> >
> >
> > >
> > >
> > > >
> > > > I'd love to see both projects share the same base syntax tree
> > > > representations. I considered .parser.html(.DomTree) and .parser.css
> > for
> > > > these, but at the moment these appeared to be a little more tied to
> > > Caja's
> > > > lexer/parser implementation than I preferred (though I admit
> > > > AbstractParseTreeNode contains most of what's needed).
> > > >
> > > > To be sure, I don't see this as an end-all-be-all transformation
> system
> > > in
> > > > any way. I'd just like to put *something* reasonable in place that we
> > can
> > > > play with, provide some benefit, and enhance into a truly
> sophisticated
> > > > vision of document rewriting.
> > > >
> > > >
> > > >>
> > > >>
> > > >> >  c. Add Gadget.getParsedContent().
> > > >> >    i. Returns a mutable GadgetContentParseTree used to manipulate
> > > Gadget
> > > >> > Contents.
> > > >> >    ii. Mutable tree calls back to the Gadget object indicating
> when
> > > any
> > > >> > change is made, and emits an error if setContent() has been called
> > in
> > > the
> > > >> > interim.
> > > >>
> > > >> In Caja we have been moving towards immutable trees...
> > > >
> > > >
> > > > Interested to hear more about this. The whole idea is for the
> gadget's
> > > tree
> > > > representation to be modifiable. Doing that with immutable trees to
> me
> > > > suggests that a rewriter would have to create a completely new tree
> and
> > > set
> > > > it as a representation of new content. That's convenient as far as
> the
> > > > Gadget's maintenance of String<->Tree representations is concerned...
> > but
> > > > seems pretty heavyweight for many types of edits: in-situ
> modifications
> > > of
> > > > text, content reordering, etc. That's particularly so in a
> > > single-threaded
> > > > (viz rewriting) environment.
> > >
> > > Never having been entirely sold on the concept, I'll let those on the
> > > Caja team who advocate immutability explain why.
> > >
> >
>

Re: Content Rewriter Modularization: Design/Change

Reply via email to