On Thu, 27 Nov 2014 01:15:20 +0100, Ian Hickson <i...@hixie.ch> wrote:

On Wed, 26 Nov 2014, Simon Pieters wrote:

- Make the end tag optional and have <menuitem>, <menu> and <hr>
generate implied </menuitem> end tags. (Maybe other tags like <li> and
<p> can also imply </menuitem>.) The label attribute be honored if
specified, otherwise use the textContent with leading and trailing
whitespace trimmed.

This would allow either syntax unless I'm missing something.

That's another option, yeah. Probably the best so far if we can't just
power through and break the sites in question. It's not yet clear to me
how many sites we're talking about here and how possible it is to
evaneglise them.

In httparchive http://bigqueri.es/t/analyzing-html-css-and-javascript-response-bodies/442 :

* 10101 pages use <menuitem>
* 39 have no label attribute
* 0 have non-whitespace content
* 15 have no end tag

Based on this, it seems possible to keep it as a void element and only use the label attribute.


SELECT COUNT(*) as num,
 CASE
WHEN REGEXP_MATCH(LOWER(body), r'<menuitem\s([^>]+\s)?label\s*=') THEN "label present"
  ELSE "no label"
 END as stat
FROM [httparchive:runs.2014_08_15_requests_body]
WHERE mimeType CONTAINS "html"
  AND REGEXP_MATCH(LOWER(body), r'<menuitem')
GROUP BY stat
ORDER BY num desc

Row     num     stat    
1       10062   label present   
2       39      no label        


SELECT COUNT(*) as num,
 CASE
WHEN REGEXP_MATCH(LOWER(body), r'<menuitem[^>]*>(\s*[^<]+)+\s*</menuitem>') THEN "has content"
  ELSE "no content"
 END as stat
FROM [httparchive:runs.2014_08_15_requests_body]
WHERE mimeType CONTAINS "html"
  AND REGEXP_MATCH(LOWER(body), r'<menuitem')
GROUP BY stat
ORDER BY num desc

Row     num     stat    
1       10101   no content      


SELECT COUNT(*) as num,
 CASE
  WHEN REGEXP_MATCH(LOWER(body), r'</menuitem>') THEN "end tag"
  ELSE "no end tag"
 END as stat
FROM [httparchive:runs.2014_08_15_requests_body]
WHERE mimeType CONTAINS "html"
  AND REGEXP_MATCH(LOWER(body), r'<menuitem')
GROUP BY stat
ORDER BY num desc

Row     num     stat    
1       10086   end tag 
2       15      no end tag      

--
Simon Pieters
Opera Software

Reply via email to