Re-responding in order to apply the last few exchanges to google-caja-discuss@ (@gmail vs. @google membership issues).
On Tue, Aug 12, 2008 at 4:48 PM, John Hjelmstad <[EMAIL PROTECTED]> wrote: > Hello, > > While beginning to refactor the rewriter APIs I've discovered that there > unfortunately is one semantic difference inherent to moving getContent() and > setContent() methods into the Gadget object (replacing > View.get/setRewrittenContent()): BasicGadgetSpecFactory no longer caches > rewritten content. > > I've written a discussion of this in issue SHINDIG-500, which tracks this > implementation sub-task: https://issues.apache.org/jira/browse/SHINDIG-500 > > To summarize: > 1. Is this change acceptable for the time being? > 2. I suggest that we can, at a later date, move fetching of gadget specs > into GadgetServer while injecting a Gadget(Spec) cache there as well, > offering finer-tuned control over caching characteristics. > > Thanks, > John > > > On Mon, Aug 11, 2008 at 2:20 PM, John Hjelmstad <[EMAIL PROTECTED]> wrote: > >> I understand these concerns, and should be clear that I don't (despite my >> personal interest in experimenting with the idea, agreed that we don't have >> time for it at the moment) have any plans to introduce this sort of RPC >> anywhere - certainly not in Shindig itself, as any such call would be hidden >> behind an interface anyway. >> >> Putting the RPC hypothetical aside, I still feel that there's value to >> implementing HTML parsing in terms of an interface: >> * Clearer separation of concerns/boundary between projects. >> - Corollary simplicity in testing. >> * Clearer API for content manipulation (that doesn't require knowledge of >> Caja). >> >> I could be convinced otherwise, but at this point the code involved seems >> of manageable size, so still worth doing. Thoughts? >> >> John >> >> >> >> On Mon, Aug 11, 2008 at 1:00 PM, Kevin Brown <[EMAIL PROTECTED]> wrote: >> >>> I agree with Louis -- that's just not practical. Every rewriting >>> operation >>> must work in real time. Caja's existing html parser is adequate for our >>> needs, and we shouldn't go out of our way to tolerate every oddity of >>> random >>> web browsers (especially as it simply wouldn't work unless you farmed it >>> out >>> to *every* browser). Any new code needs to be grounded in practical, >>> current >>> needs, not theoretical options. We can always change code later if we >>> find a >>> real need for something like that. We have real work to do in the >>> meantime. >>> >>> On Mon, Aug 11, 2008 at 12:06 PM, Louis Ryan <[EMAIL PROTECTED]> wrote: >>> >>> > John, >>> > >>> > From a practicality standpoint I'm a little nervous about this plan to >>> make >>> > RPCs calls out of a Java process to a native process to fetch a parse >>> tree >>> > for transformations that have to occur realtime. I don't think the >>> > motivating factor here is to accept all inputs that browsers can. >>> Gadget >>> > developers will tailor their markup to the platform as they have done >>> > already. I would greatly prefer us to pick one 'good' parser and stick >>> with >>> > it for all the manageability and consumability benefits that come with >>> that >>> > decision. Perhaps Im missing something here? >>> > >>> > -Louis >>> > >>> > On Mon, Aug 11, 2008 at 11:59 AM, John Hjelmstad <[EMAIL PROTECTED]> >>> wrote: >>> > >>> > > On Fri, Aug 8, 2008 at 6:10 AM, Ben Laurie <[EMAIL PROTECTED]> wrote: >>> > > >>> > > > [+google-caja-discuss] >>> > > > >>> > > > On Thu, Aug 7, 2008 at 9:27 PM, John Hjelmstad <[EMAIL PROTECTED]> >>> > wrote: >>> > > > > On Thu, Aug 7, 2008 at 3:20 AM, Ben Laurie <[EMAIL PROTECTED]> >>> wrote: >>> > > > > >>> > > > >> On Wed, Aug 6, 2008 at 11:34 PM, John Hjelmstad < >>> [EMAIL PROTECTED]> >>> > > > wrote: >>> > > > >> > This proposal effectively enables the renderer to become a >>> > > multi-pass >>> > > > >> > compiler for gadget content (essentially, arbitrary web >>> content). >>> > > Such >>> > > > a >>> > > > >> > compiler can provide several benefits: static optimization of >>> > gadget >>> > > > >> content >>> > > > >> > (auto-proxying of images, whitespace/comment removal, >>> > consolidation >>> > > of >>> > > > >> CSS >>> > > > >> > blocks), security benefits (caja et al), new functionality >>> > > (annotation >>> > > > of >>> > > > >> > content for stats, document analysis, container-specific >>> > features), >>> > > > etc. >>> > > > >> To >>> > > > >> > my knowledge no such infrastructure exists today (with the >>> > possible >>> > > > >> > exception of Caja itself, which I'd like to dovetail with this >>> > > work). >>> > > > >> >>> > > > >> Caja clearly provides a large chunk of the code you'd need for >>> this. >>> > > > >> I'd like to hear how we'd manage to avoid duplication between >>> the >>> > two >>> > > > >> projects. >>> > > > >> >>> > > > >> A generalised framework for manipulating content sounds like a >>> great >>> > > > >> idea, but probably should not live in either of the two projects >>> > (Caja >>> > > > >> and Shindig) but rather should be shared by both of them, I >>> suspect. >>> > > > > >>> > > > > >>> > > > > I agree on both counts. As I mentioned, the piece of this idea >>> that I >>> > > > expect >>> > > > > to change the most is the parse tree, and Caja's .parser.html and >>> > > > > .parser.css packages contain much of what I've thrown in here as >>> a >>> > > base. >>> > > > > >>> > > > > My key requirements are: >>> > > > > * Lightweight framework. >>> > > > > * Parser modularity, mostly for HTML parsers (to re-use the good >>> work >>> > > > done >>> > > > > by WebKit or Gecko.. CSS/JS can come direct from Caja I'd bet) >>> > > > > * Automatic maintenance of DOM<->String conversion. >>> > > > > * Easy to manipulate structure. >>> > > > >>> > > > I'm not sure what the value of parser modularity is? If the >>> resulting >>> > > > tree is different, then that's a problem for people processing the >>> > > > tree. And if it is not, then why do we care? >>> > > >>> > > >>> > > IMO the value of parser modularity is that the lenient parsers native >>> to >>> > > browsers can be used in place of those that might not accept all >>> inputs. >>> > > One >>> > > could (and I'd like to) adapt WebKit or Gecko's parsing code into a >>> > server >>> > > that runs parallel to Shindig and provides a "local RPC" service for >>> > > parsing >>> > > semi-structured HTML. The resulting tree for WebKit's parser might be >>> > > different than that for an XHTML parser, Gecko's parser, etc, but if >>> the >>> > > algorithm implemented atop it is rule-based rather than >>> strict-structure >>> > > based that should be fine, no? >>> > > >>> > > >>> > > > >>> > > > >>> > > > > >>> > > > > I'd love to see both projects share the same base syntax tree >>> > > > > representations. I considered .parser.html(.DomTree) and >>> .parser.css >>> > > for >>> > > > > these, but at the moment these appeared to be a little more tied >>> to >>> > > > Caja's >>> > > > > lexer/parser implementation than I preferred (though I admit >>> > > > > AbstractParseTreeNode contains most of what's needed). >>> > > > > >>> > > > > To be sure, I don't see this as an end-all-be-all transformation >>> > system >>> > > > in >>> > > > > any way. I'd just like to put *something* reasonable in place >>> that we >>> > > can >>> > > > > play with, provide some benefit, and enhance into a truly >>> > sophisticated >>> > > > > vision of document rewriting. >>> > > > > >>> > > > > >>> > > > >> >>> > > > >> >>> > > > >> > c. Add Gadget.getParsedContent(). >>> > > > >> > i. Returns a mutable GadgetContentParseTree used to >>> manipulate >>> > > > Gadget >>> > > > >> > Contents. >>> > > > >> > ii. Mutable tree calls back to the Gadget object indicating >>> > when >>> > > > any >>> > > > >> > change is made, and emits an error if setContent() has been >>> called >>> > > in >>> > > > the >>> > > > >> > interim. >>> > > > >> >>> > > > >> In Caja we have been moving towards immutable trees... >>> > > > > >>> > > > > >>> > > > > Interested to hear more about this. The whole idea is for the >>> > gadget's >>> > > > tree >>> > > > > representation to be modifiable. Doing that with immutable trees >>> to >>> > me >>> > > > > suggests that a rewriter would have to create a completely new >>> tree >>> > and >>> > > > set >>> > > > > it as a representation of new content. That's convenient as far >>> as >>> > the >>> > > > > Gadget's maintenance of String<->Tree representations is >>> concerned... >>> > > but >>> > > > > seems pretty heavyweight for many types of edits: in-situ >>> > modifications >>> > > > of >>> > > > > text, content reordering, etc. That's particularly so in a >>> > > > single-threaded >>> > > > > (viz rewriting) environment. >>> > > > >>> > > > Never having been entirely sold on the concept, I'll let those on >>> the >>> > > > Caja team who advocate immutability explain why. >>> > > > >>> > > >>> > >>> >> >> >

