On Mon, 18 Jun 2007 15:08:53 +0200 Stefan Behnel <[EMAIL PROTECTED]> wrote:
> > > Nick Kew wrote: > > On Mon, 18 Jun 2007 08:14:01 -0400 > > Try running the following through "xmllint --html": > > > > <meta http-equiv="content-type" content="text/html;charset=ascii" /> > > <html lang="en"> > > <head><title>foo</title></head> > > <body><h1>Hello, World</h1></body> > > </html> > > In that case I would actually prefer making it a general special case > rule in the current parser to interpret a leading <meta> tag as an > encoding hint to the parser. That would add quite a portion of > real-world non-HTML to the set of parsable (i.e. fixable) documents. > > Stefan That's what I've done in that specific case. An ad-hoc fix to a specific instance of bad markup. It does nothing for a similar case, like <html lang="en"> <h1>Hello World</h1> <head><title>foo</title></head> <body><p>Some contents here</p></body> </html> which HTMLparser fixes to <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html lang="en"><body> <h1>Hello World</h1> <title>foo</title> <h1>Hello, World</h1> </body></html> I'm trying to get away from ad-hoc fixes! -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
