On Mon, 18 Jun 2007 15:08:53 +0200
Stefan Behnel <[EMAIL PROTECTED]> wrote:

> 
> 
> Nick Kew wrote:
> > On Mon, 18 Jun 2007 08:14:01 -0400
> > Try running the following through "xmllint --html":
> > 
> > <meta http-equiv="content-type" content="text/html;charset=ascii" />
> > <html lang="en">
> > <head><title>foo</title></head>
> > <body><h1>Hello, World</h1></body>
> > </html>
> 
> In that case I would actually prefer making it a general special case
> rule in the current parser to interpret a leading <meta> tag as an
> encoding hint to the parser. That would add quite a portion of
> real-world non-HTML to the set of parsable (i.e. fixable) documents.
> 
> Stefan

That's what I've done in that specific case.  An ad-hoc fix
to a specific instance of bad markup.  It does nothing for
a similar case, like

<html lang="en">
<h1>Hello World</h1>
<head><title>foo</title></head>
<body><p>Some contents here</p></body>
</html>

which HTMLparser fixes to

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>
<html lang="en"><body>
<h1>Hello World</h1>
<title>foo</title>
<h1>Hello, World</h1>
</body></html>

I'm trying to get away from ad-hoc fixes!

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to