I'm confused. Gate crashing typically means the party's worth going to! :)
So as to prevent this topic from veering too off course, I proffer the
following overview of the CL for anyone interested to review. I understand
Paul and Louis are on board. All comments welcome.
The changes are arrayed as follows:
1. Refactoring. Previously a large amount of generic (ie. should apply to
any) HTML parser testing was stuck in the nekohtml-package tests. To the
maximum extent possible, this has been pulled into parse-package classes:
- AbstractParsingTestBase includes helper methods for any parsing- or
serialization-based test. Pulled from AbstractParserAndSerializerTest
- AbstractParserAndSerializer test contains several "common"
parse/serialize tests no matter the concrete impl.
- AbstractSocialMarkupHtmlParserTest pulls the social-markup test from
Neko into a base class.
- "Actual" tests are trivial subclasses of the abstract tests, providing a
GadgetHtmlParser instance.
- Tests converted to jUnit 4 as a side note.
Subtleties:
- Neko-based tests override a few base parse/serialize tests due to Neko
oddities. All test files have been moved to base or nekohtml subdir to
follow suit.
2. GadgetHtmlParser normalization implemented.
- GadgetHtmlParser.normalizeFragment() removed - logic now inlined into
parseDom().
+ Rationale: IMO (open to discussion) the abstract parseDomImpl() API is
unnecessary/does too much. Pretty much all gadget HTML is treated as tag
soup and cleaned up. Having a base method whose contract is to give back
unmodified tag soup thus seems right to me, with a single implementation of
the normalization logic.
- GadgetHtmlParser.parseDom() implements a large chunk of document
normalization logic. It takes tag soup as input and returns a valid HTML
document with a single top-level HTML element, in turn with two children:
head and body.
+ Multiple <head> nodes consolidated together. Likewise body.
+ Elements above first <head> -> end up in head.
+ Elements above first <body> -> end up in body.
+ Elements after <body> -> end up in body unless inside a <head> node.
+ <style> nodes pulled to <head> in relative order - only HTML-compliant
place for them, and no possibility that there will be conflicts (no
displayable elements in <head>).
- OpenSocial template parsing MAY be done as a post-processing pass on
<script> nodes. Text found therein is treated as OS (X|HT)ML.
Subtleties:
- Lots. @see parseDom() impl especially.
- NekoSimplifiedHtmlParser still impl's separate logic for parseDomImpl
and parseFragmentImpl. I didn't dive into the difference and whether we
could actually get rid of parseDomImpl in this round.
3. CajaHtmlParser implementation.
- Depends on Caja r3889 (pom.xml updated to reflect this).
- Unfortunately, parseDomImpl() does top-level <html> node synthesis to
ensure document.getDocumentElement() returns it. This is for
NekoSimplified/Caja dual compatibility w/ GadgetHtmlParser base logic. As
noted, I'd prefer to move this synthesis code into
GadgetHtmlParser.parseDom() if possible.
- Pretty straightforward past that. Defers to Caja's parser for fragment
processing. That's about it.
Misc: setValijaMode(true) removed from CajaContentRewriter, since it's now
default in the relevant Caja version.
-j-
On Fri, Dec 4, 2009 at 4:41 PM, Dan Shepherd
<[email protected]>wrote:
> Indeed :) sorry for gate crashing!
>
> On 5 Dec 2009 00:35, "John Hjelmstad" <[email protected]> wrote:
>
> At all the best Shindigs, people show up fashionably late.
>
> On Fri, Dec 4, 2009 at 4:32 PM, Dan Shepherd
> <[email protected]>wrote:
>
> > Call this a shining? shouldn't you folks be out partying ;) > > On 5 Dec
> 2009 00:28, "Kevin Brown...
>