SB wrote:
I wouldlike to know a little about the
implementation of NekoHTML's html parser.
The source code is available for your perusal. :)
can I do this transformation?. I cannot even
parse the former as it is ill-formed.. How does
> NekoHtml go about reading an ill-formed xml?
NekoHTML doesn't read XML, it reads HTML. HTML is
based on SGML which allows things like optional end
tags and unquoted attributes. But Neko isn't even a
full blown SGML parser -- it's tailored specifically
for HTML and the common mistakes people make when
writing it by hand.
If you're really interested in how it works, you can
look at the source code. The primary piece is just
two classes: the scanner and the tag balancer. The
scanner, besides being written as a state machine,
is pretty straightforward. The tag balancer on the
other hand is a little trickier.
But basically the tag balancer works like this: it
looks up each element that it knows, find out what
parent element it should be contained within and
adds the parent element if needed. It also does
things like check to make sure "inline" elements
like <b>, <i>, etc. are properly balanced.
--
Andy Clark * [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]