Nick Kew wrote: > Stefan Behnel <[EMAIL PROTECTED]> wrote: >> Nick Kew wrote: >>> On Mon, 18 Jun 2007 08:14:01 -0400 >>> Try running the following through "xmllint --html": >>> >>> <meta http-equiv="content-type" content="text/html;charset=ascii" /> >>> <html lang="en"> >>> <head><title>foo</title></head> >>> <body><h1>Hello, World</h1></body> >>> </html> >> In that case I would actually prefer making it a general special case >> rule in the current parser to interpret a leading <meta> tag as an >> encoding hint to the parser. That would add quite a portion of >> real-world non-HTML to the set of parsable (i.e. fixable) documents. [...] > I'm trying to get away from ad-hoc fixes!
I don't consider that an ad-hoc fix. It's just special casing a specific type of broken HTML that exists in real life. I wouldn't even mind if the <meta> tag was discarded, it should just a) be interpreted as an encoding hint and b) not change the remaining 'real' markup. I think such a rule should go into the mainstream parser. Stefan _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
