[web2py:25609] xhtml vs html

Jonathan Lundell Fri, 03 Jul 2009 19:19:38 -0700

By way of background, web2py generates xhtml pages, with these  
doctypes (I'm not clear on why one or the other is chosen; perhaps  
someone could enlighten me):


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd 
">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd 
"

The problem is, web2py's xhtml is not, by and large, valid. This ought  
to cause a problem, because xhtml parsers aren't allowed to handle  
invalid markup. We get away with it because essentially all servers  
present our pages as html, not xhtml, and our browsers parse it as  
html, which is more forgiving.

(One of the better explanations of all this is: 
<http://webkit.org/blog/2006/09/20/understanding-html-xml-and-xhtml/ 
 >.)

Here's an illustration. With a standards-compliant browser (I'm using  
Safari 4.0.1 and Firefox 3.5 on OS X), have a look at these pages:

http://www.web2py.com/examples/spreadsheet
http://lobitos.net/w2p/w2p.html
http://lobitos.net/w2p/w2p.xhtml

They're identical (I copied the first one, changing only some URLs to  
be absolute instead of relative). You should see that your browser  
won't display the third version.

Why? Because the first two are served as "Content-Type: text/html",  
while the third is served as "Content-Type: application/xhtml+xml".  
This is how Apache handles .html and .xhtml files by default. Notice  
that the browser pays attention to the Content-Type header and ignores  
the DOCTYPE (the w3 validator looks at DOCTYPE, though). Notice also  
that the browser is ignoring the "<meta http-equiv="content-type"  
content="text/html; charset=utf-8" />" line; it really does believe  
the http header: not the doctype, not the meta.

So. Why use the XHTML DOCTYPE? The main reason that I can think of is  
that the resulting document, if valid, has a DTD and can be parsed as  
XML. Not by the browser, which is going to parse it as bad html, but  
by someone else, maybe. But that reason only holds water if the  
document is actually *valid* xhtml.

(Another note: this is the spreadsheet application, but the same thing  
would happen with many, perhaps most, web2py pages, certainly any with  
<form action="">. You can try the same experiment with other pages.)

What to do? In my view, there's a short-term answer and a long-term  
answer. And they're both complicated by legacy compatibility issues,  
which I'll take the liberty of ignoring here.

In the short term, fix the output to be valid XHTML. This is easier if  
you use transitional rather than strict.

In the long term, move to HTML. Given that it looks like XHTML2 has  
been abandoned, the future standard is going to be HTML5. Not very  
soon, since browsers are only starting to support it, and the FSM only  
knows when Microsoft will get around to it, but eventually, since  
HTML5 has a lot of nifty features.

Me, I've settled on HTML4 Transitional for anything I've got control  
over, using XHTML only for a couple of pages that I need to parse as  
XML.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py Web Framework" group.
To post to this group, send email to web2py@googlegroups.com
To unsubscribe from this group, send email to 
web2py+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

[web2py:25609] xhtml vs html

Reply via email to