[whatwg] Bogus comment state and CDATA section state do not stylistically fit in the tokenizer
It would aid programmatic conversion of the spec, and confuse me when reading the spec less thereby avoiding bugs like 25871, if these states matched the model of the rest of the tokenizer. Thus I propose the bogus comment state becomes: > Consume the next input character: > > U+003E GREATER-THAN SIGN (>): > > Switch to the data state. Emit the comment token. > > U+ NULL: > > Append a U+FFFD REPLACEMENT CHARACTER character to the comment token's data. > > EOF: > > Switch to the data state. Emit the comment token. Reconsume the EOF character. > > Anything else: > > Append the current input character to the comment token's data. This also necessitates creating a new comment token prior to entering the bogus comment state. The CDATA section state should become: > Consume the next input character: > > U+005D RIGHT SQUARE BRACKET (]): > > If the three characters starting from the current input character are U+005D > RIGHT SQUARE BRACKET U+005D RIGHT SQUARE BRACKET U+003E GREATER-THAN SIGN > (]]>), then consume those characters and switch to the data state. Otherwise, > emit the current input character as a character token. > > EOF: > > Switch to the data state. Reconsume the EOF character. > > Anything else: > > Append the current input character to the comment token's data. No changes are needed elsewhere for this. (There is no consistent style for lookahead — and most cases are ASCII case-insensitive words — so I went with what seems sane here!) /Geoffrey
Re: [whatwg] Parse errors for invalid characters
On 06/09/2013 04:05, Kang-Hao (Kenny) Lu wrote: (2013/09/06 6:08), Geoffrey Sneddon wrote: The phrasing content section states: Text nodes and attribute values must consist of Unicode characters, must not contain U+ characters, must not contain permanently undefined Unicode characters (noncharacters), and must not contain control characters other than space characters. This specification includes extra constraints on the exact value of Text nodes and attribute values depending on their precise context. And the pre-processing the input-stream section states: Any occurrences of any characters in the ranges U+0001 to U+0008, U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters U+000B, U+FFFE, U+, U+1FFFE, U+1, U+2FFFE, U+2, U+3FFFE, U+3, U+4FFFE, U+4, U+5FFFE, U+5, U+6FFFE, U+6, U+7FFFE, U+7, U+8FFFE, U+8, U+9FFFE, U+9, U+AFFFE, U+A, U+BFFFE, U+B, U+CFFFE, U+C, U+DFFFE, U+D, U+EFFFE, U+E, U+E, U+F, U+10FFFE, and U+10 are parse errors. These are all control characters or permanently undefined Unicode characters (noncharacters). Note the first uses "Unicode characters", the second "characters" — the former excludes surrogates as a conformance requirement. Note that every disallowed non-surrogate character is a parse error. Except U+ or am I missing something? This is handled inline in the parser, as noted in the preprocessing section. It sometimes gets passed through as U+, sometimes gets changed to U+FFFD, sometimes gets ignored, but always creates a parser error. Therefore, it would make sense to make surrogates parse errors. It should be noted that they can only occur in the input stream if they come from script (as they cannot be decoded from the input byte stream as the decoders will never emit a surrogate). which means that this seems ... cubersome ... to implement in a conformance checker. Which reminds me, does # Conformance checkers must report at least one parse error # condition to the user if one or more parse error conditions exist # in the document and must not report parse error conditions if none # exist in the document. Conformance checkers may report more than # one parse error condition if more than one parse error condition # exists in the document. mean validator.nu and Firefox view source are non-conforming because they do nothing about document.write() ? I think we should exempt conformance checkers from scripts instead. They already are. From the "Conformance classes" section: Conformance checkers must check that the input document conforms when parsed without a browsing context (meaning that no scripts are run, and that the parser's scripting flag is disabled), and should also check that the input document conforms when parsed with a browsing context in which scripts execute, and that the scripts never cause non-conforming states to occur other than transiently during script execution itself. (This is only a "SHOULD" and not a "MUST" requirement because it has been proven to be impossible. [COMPUTABLE]) (I feel like pedanting and pointing out this is untrue — it has not been proven impossible to do, it has been proven impossible to do in general. It wouldn't be that hard to design a conformance checker to check "document.write("<p>")".) On the other hand, a JS console can reasonably report parse errors from script, so the parse errors are still worthwhile to have. /Geoffrey.
[whatwg] Parse errors for invalid characters
The phrasing content section states: Text nodes and attribute values must consist of Unicode characters, must not contain U+ characters, must not contain permanently undefined Unicode characters (noncharacters), and must not contain control characters other than space characters. This specification includes extra constraints on the exact value of Text nodes and attribute values depending on their precise context. And the pre-processing the input-stream section states: Any occurrences of any characters in the ranges U+0001 to U+0008, U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters U+000B, U+FFFE, U+, U+1FFFE, U+1, U+2FFFE, U+2, U+3FFFE, U+3, U+4FFFE, U+4, U+5FFFE, U+5, U+6FFFE, U+6, U+7FFFE, U+7, U+8FFFE, U+8, U+9FFFE, U+9, U+AFFFE, U+A, U+BFFFE, U+B, U+CFFFE, U+C, U+DFFFE, U+D, U+EFFFE, U+E, U+E, U+F, U+10FFFE, and U+10 are parse errors. These are all control characters or permanently undefined Unicode characters (noncharacters). Note the first uses "Unicode characters", the second "characters" — the former excludes surrogates as a conformance requirement. Note that every disallowed non-surrogate character is a parse error. Therefore, it would make sense to make surrogates parse errors. It should be noted that they can only occur in the input stream if they come from script (as they cannot be decoded from the input byte stream as the decoders will never emit a surrogate). /Geoffrey.
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On 21/03/12 04:31, Mark Callow wrote: On 17/03/2012 08:19, Boris Zbarsky wrote: I think that trying to get web developers to do this right is a lost cause, esp. because none of them (to a good approximation) have any big-endian systems to test on. On what do you base this oft-repeated assertion? ARM CPUs can work either way. I have no idea how the various licensees are actually setting them up. All major mobile OSes use LE on ARM — I believe we currently don't ship anything on BE ARM. (We, do, however, currently ship on BE MIPS, though MIPS too is mostly LE nowadays). -- Geoffrey Sneddon — Opera Software <http://gsnedders.com> <http://opera.com>
Re: [whatwg] Adding ECMAScript 5 array extras to HTMLCollection (ATTN IE TEAM - TRAVIS LEITHEAD)
On 28/04/10 23:28, Garrett Smith wrote: On Wed, Apr 28, 2010 at 2:12 AM, James Graham wrote: On 04/28/2010 10:27 AM, David Bruant wrote: When I started this thread, my point was to define a normalized way (through ECMAScript binding) to add array extras to array-like objects in the scope of HTML5 (HTMLCollection and inheriting interfaces). I don't see any reason yet to try to find a solution to problems that are in current web browsers. Of course, if/when a proposal emerges from this thread and some user agent accept to implement it, a workaround (probably, feature detection) will have to be found to use the feature in user agents that implement it and doing something equivalent in web browsers that don't. To be clear the proposals in this thread are pure syntactic sugar; they don't allow you do do anything that you can't already do like: Array.prototype.whatever.call(html_collection, arg1, arg2, ...) where "whatever" is the array method you are interested in. - and from that you can expect errors in Internet Explorer up to and including version 8. Adding a toArray operation (for example) won't work in IE up to and including version 8 though either. There's no point in adding a toArray operation for the pure reason that they currently don't implement another part of the spec (through the WebIDL references) currently. toArray adds no extra usefulness once they implement other parts of the spec. Of course there is nothing wrong with making the syntax more natural if it can be done in a suitably web-compatible way. However it seems more sensible to do this at a lower level e.g. as part of Web DOM Core. Sadly that spec is in need of an editor. The problem that has been well established is that Internet Explorer's implementation of host object collections or "dhtml collection"[1] objects is incompatible with JScript implementation of Array generics. The result of attempting to supply an Internet Explorer "dhtml collection" to an Array generic method, e.g. "slice", as the `this` value, results in a jscript runtimer error: "JScript object expected". IE8: [].slice.call(document.styleSheets); Result: Error: "JScript object expected". In IE8 document.styleSheets.toArray().slice(0, 1); also throws an error. How does adding toArray help for IE8, which you're giving as the reason for adding it? Travis Leithead and IE Team: Can you release Internet Explorer 9 with all "dhtml collections" implemented as native EcmaScript objects? As far as I am aware, none of them are on this list. -- Geoffrey Sneddon — Opera Software <http://gsnedders.com/> <http://www.opera.com/>
Re: [whatwg] Adding ECMAScript 5 array extras to HTMLCollection
On 27/04/10 20:23, David Bruant wrote: Le 27/04/2010 03:54, Geoffrey Sneddon a écrit : On 26/04/10 19:50, And Clover wrote: David Flanagan wrote: Rather that trying to make DOM collections feel like arrays, how about just giving them a toArray() method? I like that, as a practical and explicit (JavaScript-specific) binding. In the longer term, what's the thinking on a more basic change: - Require specific DOM interfaces like NodeList, HTMLCollection, Element etc. to be available for prototype monkey-patching under their interface names as properties of `window`? Then we wouldn't have to worry about what Array-like methods need to be provided on HTMLCollection, because application and framework authors could choose whichever they liked to prototype in. IE8/Moz/Op/Saf/Chr already do this to a significant extent, but there's no standard that says they have to. It would allow DOM extension to be put on a much less shaky footing than the messy hack Prototype 1.x uses. Is this something that's a reasonable requirement for browsers in future? HTML5 through WebIDL and its ECMAScript binding already does require this. I can see where interfaces are expected to be exposed ([NamedConstructor]) in the global object, but I don't see where it is said that the prototype of the constructor must be extensible. I don't even see this in the section which is the relevent one in my opinion (Interface prototype object) I have read this version of WebIDL : http://dev.w3.org/2006/webapi/WebIDL/ Section 4.1.1 Interface object: The interface object MUST also have a property named "prototype" with attributes { DontDelete, ReadOnly } whose value is an object called the interface prototype object. This object provides access to the functions that correspond to the operations defined on the interface, and is described in more detail in section 4.4.3 below. -- Geoffrey Sneddon — Opera Software <http://gsnedders.com/> <http://www.opera.com/>
Re: [whatwg] Adding ECMAScript 5 array extras to HTMLCollection
On 26/04/10 19:50, And Clover wrote: David Flanagan wrote: Rather that trying to make DOM collections feel like arrays, how about just giving them a toArray() method? I like that, as a practical and explicit (JavaScript-specific) binding. In the longer term, what's the thinking on a more basic change: - Require specific DOM interfaces like NodeList, HTMLCollection, Element etc. to be available for prototype monkey-patching under their interface names as properties of `window`? Then we wouldn't have to worry about what Array-like methods need to be provided on HTMLCollection, because application and framework authors could choose whichever they liked to prototype in. IE8/Moz/Op/Saf/Chr already do this to a significant extent, but there's no standard that says they have to. It would allow DOM extension to be put on a much less shaky footing than the messy hack Prototype 1.x uses. Is this something that's a reasonable requirement for browsers in future? HTML5 through WebIDL and its ECMAScript binding already does require this. -- Geoffrey Sneddon — Opera Software <http://gsnedders.com/> <http://www.opera.com/>
Re: [whatwg] HTML5 doctypes incompatible with XHR if named entities present
Aryeh Gregor wrote: On Thu, Nov 12, 2009 at 12:33 AM, Boris Zbarsky wrote: I assume you meant "mostly" as in "most of the pages are well-formed", not "pages are mostly well-formed", since the latter is useless, right? I did a brief survey of obvious sites fitting those descriptions that I had in my browser history at the moment. . . . So either you're looking at a totally different dataset or "mostly" is a bit of a stretch I admit I didn't look closely. At a guess, maybe the default WordPress skin(s) are valid XHTML, but custom skins are very popular for WordPress and those mostly aren't valid XHTML? MediaWiki is unreasonably difficult to reskin, so that's not much of a problem for us . . . Even with the default skin it's easy to break (e.g., search for U+). That'll be output to the page and make it not well-formed. -- Geoffrey Sneddon — Opera Software <http://gsnedders.com/> <http://www.opera.com/>
Re: [whatwg] "Script Data" tokenizer mode
Matt Hall wrote: When the "script data" state was added to the tokenizer, the tree construction algorithm was updated to switch the tokenizer into this state upon finding a start tag named "script" while in the "in head" insertion mode (9.2.5.7). I see that a corresponding change was not made to 9.5 about "Parsing HTML Fragments" as it still says to switch into the RAWTEXT state upon finding a "script" tag. Does anyone know if this difference is intentional, or did someone just forget to update the fragment parsing case? I think, due to the fact that no start tag has ever been emitted by the tokenizer, that RAWTEXT and the script data states should behave identically for the script element fragment case. (Once you take into account that there are no appropriate end tag token, all the careful casing for the comments effectively becomes nothing, and regardless of input everything will become character tokens. This is true of both the script data state and the RAWTEXT state: the latter is probably preferably due to its far lower complexity.) -- Geoffrey Sneddon — Opera Software <http://gsnedders.com/> <http://www.opera.com/>
Re: [whatwg] Character casing for "Appropriate End Tags" and the "temporary buffer"
Matt Hall wrote: Apologies for the repost -- here is the original e-mail in plain text: Prior to r4177, the matching of tag names for exiting the RCDATA/RAWTEXT states was done as follows: "...and the next few characters do no match the tag name of the last start tag token emitted (compared in an ASCII case-insensitive manner)" However, the current revision doesn't include any comment on character casing in its discussion of "Appropriate End Tags." Similarly, certain tokenizer states require that you check the contents of the "temporary buffer" against the string "script" but there is no indication of whether or not to do this in a case-insensitive manner. In both cases, should this comparison be done in an ASCII case-insensitive manner or not? It might be helpful to clarify the spec in both places in either case. It is already case-insensitive as you lowercase the characters when creating the token name and when adding them to the buffer. -- Geoffrey Sneddon — Opera Software <http://gsnedders.com/> <http://www.opera.com/>
Re: [whatwg] More prohibited characters for unquoted attributes are needed
Ian Hickson wrote: On Mon, 7 Sep 2009, Aryeh Gregor wrote: On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon wrote: Apparently Hixie had previously said he didn't want to change this as it will become a non-issue over time. I think it does matter due to the security issues it presents in existing UAs. Conforming markup (using elements/attributes allowed in HTML 4.01) should not cause JS to execute in one browser but not in another. I agree with you as an author. I wrote an HTML output function in MediaWiki assuming that what the standard says is known to be interoperable, which is apparently wrong. If I hadn't been keeping up with HTML 5, I would have introduced an XSS vulnerability because of some browsers' handling of `. If the problem will go away with time, then perhaps a later version of the standard could make such unquoted attributes conforming, once there's no more problem with them. As far as I can tell, this is an IE bug; treating "`" as an attribute quoting character is non-conforming in any version of HTML so far, it seems. I'm certainly not going to make it non-conforming to stumble into any IE bug or difference in parsing between IE and previous specs or other browsers; we'd just end up with an asanine set of conformance requirements. I agree that it's pointless to make it non-conforming to hit any parsing bug, but I would argue that we should make as many cases as it is sensible to do so non-conforming if they open up security holes in websites on legacy UAs, given that website uses a HTML 5 parser/sanitizer/serializer. For example, should this be non-conforming? Test Search: This perfectly innocent piece of HTML content (HTML2-compliant except for the DOCTYPE) results in a non-tree DOM in IE8. Should we make it non-conforming? No, it opens up no security hole if that is done. Similarly, IE conditional comments make it trivial to trigger scripts in IE but not another UA; indeed people do this on purpose. Should we make those non-conforming also? They are a harder issue, but I think it is probably fair enough to assume that most sanitizers drop comments for such reasons, hence making them fine to leave as conforming also. As I understand it, the attack here is a site that allows the user to input text that is used verbatim in two attributes, such that the user can set the first attribute's value to: ` ...and the second to: ` onload='...payload...' end=x ...with the assumption that the site is going to not quote the first one, and quote the second one with double quotes: (This is the default behaviour of Python html5lib, FWIW: the first is not quoted as it does not contain any whitespace characters or U+003E (>), the latter is quoted for that reason.) ...which in IE, for some reason, gets treated as: Indeed, this is the attack I (and others) am concerned about. I've disallowed ` in unquoted attribute values for now, but I think we should revert this once IE has fixed this bug for a few years. Right, once versions of IE with this bug have faded out of existence I think this will become a non-issue. I also expect that'll be a while yet, though, and I highly doubt that time will have come even by the time when HTML 5 goes to REC. Furthermore, if there are similar attacks to this, I think they should similarly be made non-conforming. -- Geoffrey Sneddon — Opera Software <http://gsnedders.com/> <http://www.opera.com/>
Re: [whatwg] More prohibited characters for unquoted attributes are needed
On 6 Sep 2009, at 12:35, Aryeh Gregor wrote: See some research here: http://code.google.com/p/html5lib/issues/detail?id=93 It seems like in addition to whitespace and "'=<> , the characters U+ through U+0020 should be banned from unquoted attribute values, as well as U+0060 (backtick `), for the sake of compatibility. Apparently Hixie had previously said he didn't want to change this as it will become a non-issue over time. I think it does matter due to the security issues it presents in existing UAs. Conforming markup (using elements/attributes allowed in HTML 4.01) should not cause JS to execute in one browser but not in another. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Text areas with pattern attributes?
Alex Vincent wrote: I'm drifting into writing code for the pattern attribute on text fields again, and I wondered: if text inputs can have pattern attribute for regular expression matching, why not text area elements? What's the use-case for it? Textareas are almost always for such large amounts of input that they are almost always free-form text. Why allow the pattern attribute? -- Geoffrey Sneddon — Opera Software <http://gsnedders.com/> <http://www.opera.com/>
Re: [whatwg] A tag for measurements / quantity?
Jeremy Keith wrote: > Unit-measures differ from locale to locale (e.g. Fahrenheit vs. Celsius, > pound versus Kilogram), making comparison and matching of offerings > difficult. There's more variation than that: (imperial) gallon v. (US) gallon. Cases like that really make it hard to deal with. Then you have varying names in different languages, disagreement about what kilobyte means, and so much more… Sounds like a whirlwind of fun. -- Geoffrey Sneddon — Opera Software <http://gsnedders.com/> <http://www.opera.com/>
Re: [whatwg] New HTML5 spec GIT collaboration repository
Manu Sporny wrote: Cameron McCormack wrote: Manu Sporny: 3. Running the Anolis post-processor on the newly modified spec. Geoffrey Sneddon: Is there any reason you use --allow-duplicate-dfns? I think it’s because the source file includes the source for multiple specs (HTML 5, Web Sockets, etc.) which, when taken all together, have duplicate definition. Manu’s Makefile will need to split out the HTML 5 specific parts (between the and markers). The ‘source-html5 : source’ rule in http://dev.w3.org/html5/spec-template/Makefile will handle that. What a great answer, Cameron! I wish I had thought of that :) Ah, that's true. I was assuming he was working on the split-up spec. Yes, that will become an issue in time and was going to have a chat with Geoffrey about how to modify Anolis to handle that as well as handling what happens when there is no definitions when building the cross-references (perhaps having a formatter warnings section in the file?). Handle that in what way? The correct way, as far as I can see, is to do what Ian does, which is to call Anolis on the already-split up spec. What do you mean about warnings? Just if there's an instance of a term which isn't defined? That can't be done, because it would mean that every abbr, code, i, span and var element would have to be an instance (whereas they can perfectly fine exist without being one). It's probably worth throwing an error/warning when data-anolis-xref is set and it is unknown, though. (But that will probably change to data-xref.) I also spoke too soon, Geoffrey, --allow-duplicate-dfns is needed because of this error when compiling Ian's spec: The term "dom-sharedworkerglobalscope-applicationcache" is defined more than once I'm probably doing something wrong... haven't had a chance to look at Cameron's Makefile pointer yet, so "--allow-duplicate-dfns" is in there for now. I expect you are doing something wrong, because that doesn't exist in Ian's copy. :) With regards to tracking Anolis, your free to pull it in if you want, but you probably don't want to track it too closely (currently there haven't been any major changes from 1.0, though they are coming soon, so it may get a miss less stable). I tend to ping James (Graham) provided it's stable, so he can update pimpmyspec.net, and I can try and remember to ping you too. -- Geoffrey Sneddon — Opera Software <http://gsnedders.com/> <http://www.opera.com/>
Re: [whatwg] New HTML5 spec GIT collaboration repository
Manu Sporny wrote: 3. Running the Anolis post-processor on the newly modified spec. Is there any reason you use --allow-duplicate-dfns? Likewise, you probably don't want --w3c-compat (the name is slightly misleading, it provides compatibility with the CSS WG's CSS3 Module Postprocessor, not with any W3C pubrules). On the whole I'd recommend running it with: --w3c-compat-xref-a-placement --parser=lxml.html --output-encoding=us-ascii The latter two options require Anolis 1.1, which is just as stable as 1.0. I believe those options are identical to how Hixie runs it through PMS. -- Geoffrey Sneddon — Opera Software <http://gsnedders.com/> <http://www.opera.com/>
Re: [whatwg] A New Way Forward for HTML5
Ian Hickson wrote: On Thu, 23 Jul 2009, Tab Atkins Jr. wrote: That being said, inline spec comments sound interesting. Can you expand on this? Are these meant to be private and only shown to Ian? Shown to everything who views the spec (optionally, of course)? Sent to the mailing list? If anybody would like to follow-up on this particular idea, I'm very interested in setting something up that makes it even easier to submit comments without having to worry about subscribing to the lists or registering with the W3C's Bugzilla instance. I'm not quite sure what the UI would look like, but if anyone has any ideas, feel free to e-mail me directly and we can figure something out. (This would be exceedingly useful once we're in last call in a few months.) I remember having some discussion about such a thing in IRC a few months ago. Indeed, the biggest problem seems to be what sort of UI we could use for it. My proposal, on the whole, would be to have some box appearing upon selecting text. Then, in that box, give space for both an email address and a comment, and send that along with the selected text to the list. -- Geoffrey Sneddon — Opera Software ASA <http://gsnedders.com/> <http://www.opera.com/>
Re: [whatwg] input type="url" allow URLs without http:// prefix
On 12 Jul 2009, at 10:46, Bruce Lawson wrote: The eleventy squillion WordPress sites out there that allow comments ask for your web page address as well as name and email. The method of entering a URL does not require the http:// prefix; just beginning the URL with www is accepted. As it's very common for people to drop the http:// prefix on advertising, business cards etc (and who amongst us reads out the prefix when reading a URL on the phone?) I'd like to suggest that input type="url" allows the http:// prefix to be optional on input and, if ommitted, be assumed when parsing. How do we tell apart "foo.html" (a relative URL) and "example.com" (a host name)? -- Geoffrey Sneddon <http://gsnedders.com/>
[whatwg] Charset override table should match case of IANA registry
Although charsets are case insensitive, it'd probably be best to be consistent with the IANA registry. The only change this means makes is changing "Windows-*" to "windows-*".
Re: [whatwg] Codec mess with and tags
On 7 Jun 2009, at 16:30, David Gerard wrote: 2009/6/7 : There are concerns or issues with all of these: a) a number of large companies are concerned about the possible unintended entanglements of the open-source codecs; a 'deep pockets' company deploying them may be subject to risk here. Google and other companies have announced plans to ship Ogg Vorbis and Theora or are shipping Ogg Vorbis and Theora, so this may not be considered a problem in the future. Indeed. There are no *credible* claims of submarine patent problems with the Ogg codecs that would not apply precisely as much to *any other codec whatsoever*. In fact, there are less, because the Ogg codecs have in fact been thoroughly researched. This claimed objection to Ogg is purest odious FUD, and should be described as such at every mention of it. It is not credible, it is a blatant and knowing lie. How is it incredible? Who has looked at the submarine patents? They by definition are unpublished! Yes, certainly, published patents are well researched, but this is not the objection that anyone has made to it. -- Geoffrey Sneddon
Re: [whatwg] Google's use of FFmpeg in Chromium and Chrome Was: Re: MPEG-1 subset proposal for HTML5 video codec
On 2 Jun 2009, at 02:58, Chris DiBona wrote: One participant quoted one of the examples from the LGPL 2.1, which says "For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library." I'm still unclear as to how this does not apply to Chrome's case. If I get a copy of Chrome, you are bound (by the LGPL) to provide me with a copy of the source ffmpeg, and I must be able to redistribute that in either binary or source form. I would, however, get in trouble for not having paid patent fees for doing so. Hence, as that example concludes, you cannot distribute ffmpeg whatsoever. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] First or last Content-Type header?
On 31 May 2009, at 12:55, Geoffrey Sneddon wrote: IE use the first header in all cases where it doesn't expect the header to appear more than once (i.e., a header like "X-Foobar" appearing twice returns the value of the first one). I don't think this is quite true, actually. It doesn't always use the first header, I don't think (from memory). Try: Content-Type: jkfjkdsfjdsf Content-Type: text/xml Content-Type: text/plain I think it'll use text/xml as the first valid value (and in the case of other browsers using the last header gives compat. with the majority of the content that relies upon this behaviour). It's probably simplest just using the last header, actually, then. I should probably try playing around with HTTP parsing again some more… -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] First or last Content-Type header?
On 30 May 2009, at 23:20, Adam Barth wrote: In editing the content sniffing Internet Draft today, I noticed the draft uses the *first* Content-Type header. Internet Explorer uses the first Content-Type header, but Firefox and Google Chrome use the last Content-Type header. (I don't recall off-hand which Safari or Opera use.) Because the sniffing algorithm is more similar to the algorithms used by Firefox and Google Chrome, I've changed this aspect to match them as well. Firefox, Safari and Opera use the last header in all cases where there is a header that is only expected to appear once (i.e., doesn't take a #rule as a value), and have a list of all headers that they expect to appear only once. IE use the first header in all cases where it doesn't expect the header to appear more than once (i.e., a header like "X-Foobar" appearing twice returns the value of the first one). I don't know about Chrome, because that only appeared after I last did any work on HTTP parsing (but it normally follows Firefox from the small amount of experimentation I've done with it since). I, on the whole, would be tempted to take the first header, and use a list of headers that you expect to only appear once (i.e., a mix of behaviours). -- Geoffrey Sneddon <http://gsnedders.com/>
[whatwg] Naming of "Self-closing start tag state"
I think this is a bit of a misnomer, as the current token can be an end tag token (although it will throw a parse error whatever happens once it reaches this state). I suggest renaming it to "self-closing tag state". -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Link rot is not dangerous
On 16 May 2009, at 07:08, Leif Halvard Silli wrote: Geoffrey Sneddon Fri May 15 14:27:03 PDT 2009 On 15 May 2009, at 18:25, Shelley Powers wrote: > One of the very first uses of RDF, in RSS 1.0, for feeds, is still > in existence, still viable. You don't have to take my word, check it > out yourselves: > > http://purl.org/rss/1.0/ Who actually treats RSS 1.0 as RDF? Every major feed reader just uses a generic XML parser for it (quite frequently a non-namespace aware one) and just totally ignores any RDF-ness of it. What does it mean to "treat as RDF"? An "RSS 1.0" feed is essentially a stream of "items" that has been lifted from the page(s) and placed in an RDF/XML feed. When I read e.g. http://www.w3.org/2000/08/w3c-synd/home.rss in Safari, I can sort the news items according to date, source, title. Which means - I think - that Safari sees the feed as "machine readable". It is certainly possible to do more - I guess, and Safari does the same to non-RDF feeds, but still. And search engines should have the same opportunities w.r.t. creating indexes based on "RSS 1.0" as on RDFa. (Though here perhaps comes in between the fact that search engines prefers to help us locate HTML pages rather than feeds.) I mean using an RDF processor, and treating it as an RDF graph. Everything just creates from an XML stream (or object model) a bunch of items with a certain title, date, and description, and acts on that (and parses it out in a format specific manner, so it creates the same sort of item for, e.g., Atom) — it doesn't actually use an RDF graph for it. If you can find any widely used software that actually treats it as an RDF graph I'd be interested to know. -- Geoffrey Sneddon <http://gsnedders.com/> <http://simplepie.org/>
Re: [whatwg] Link rot is not dangerous
On 15 May 2009, at 18:25, Shelley Powers wrote: One of the very first uses of RDF, in RSS 1.0, for feeds, is still in existence, still viable. You don't have to take my word, check it out yourselves: http://purl.org/rss/1.0/ Who actually treats RSS 1.0 as RDF? Every major feed reader just uses a generic XML parser for it (quite frequently a non-namespace aware one) and just totally ignores any RDF-ness of it. -- Geoffrey Sneddon <http://gsnedders.com/> <http://simplepie.org/>
Re: [whatwg] Incorrect declaration of the default namespace in user agent CSS
On 19 Apr 2009, at 21:01, Sergey Ilinsky wrote: In the "10.2 The CSS user agent style sheet and presentational hints" The declaration of the default namespace (to be applied to names that have no explicit namespace component) is incorrect: @namespace url(http://www.w3.org/1999/xhtml); Correct one should look like [1]: @namespace "http://www.w3.org/1999/xhtml";; [1] http://www.w3.org/TR/css3-namespace/#declaration According to that document both are correct. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg]
On 10 Mar 2009, at 17:03, David Singer wrote: At 3:22 +0100 10/03/09, Charles McCathieNevile wrote: That format has some serious limitations for heavy metadata users. In particular for those who are producing information about historical objects, from British Parliamentary records to histories of pre-communist Russia or China to museum collections, the fact that it doesn't handle Julian dates is a big problem - albeit one that could be solved relatively simply in a couple of different ways. The trouble is, that opens a large can of worms. Once we step out of the Gregorian calendar, we'll get questions about various other calendar systems (e.g. Roman ab urbe condita <http://en.wikipedia.org/wiki/Ab_urbe_condita >, Byzantine Indiction cycles <http://en.wikipedia.org/wiki/ Indiction>, and any number of other calendar systems from history and in current use). Then, of course, are the systems with a different 'year' (e.g. lunar rather than solar). And if we were to introduce a 'calendar system designator', we'd have to talk about how one converted/normalized. Ultimately, why is the Gregorian calendar good enough for the ISO but not us? I'm sure plenty of arguments were made to the ISO before ISO8601 was published, yet that still supports only the Gregorian calendar, having been revised twice since it's original publication in 1988. Is there really any need to go beyond what ISO 8601 supports? -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] content models clarity
On 6 Mar 2009, at 11:53, Rikkert Koppes wrote: The content models [1] section is pretty clear, however, whenever one wants to know which elements fall into a specific content model, one has to look at the "categories" definition of each element. A (possibly non normative) overview would be helpful. I understand it is repeating information, hence possibly leading to conflicts when not paying attention while editing, but I think it would clarify a lot. Consider a similar overview to the one found in [2] listing the applicable attributes to various input types. [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/dom.html#content-models [2] http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html#the-input-element It is intended that such a thing be part of the as-of-yet unwritten index. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] HTML5 DOCTYPE suggestion
On 21 Feb 2009, at 12:37, mikemi...@verizon.net wrote: If the doctype is instead Then Gecko-based UAs would be in quirks mode. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Issues relating to the syntax of dates and times
On 2 Jan 2009, at 21:53, Asbjørn Ulsberg wrote: On Wed, 26 Nov 2008 11:09:24 +0100, Ian Hickson wrote: The spec draws the line already -- it says that the date has to be in the proleptic Gregorian calendar, and that the year has to be greater than zero. Reading the spec, I have to wonder: Does HTML5 need to specify as much as it does inline? Can't more of it be referenced to ISO 8601 or even better; RFC 3339? I really fancy how Atom (RFC 4287) has defined date constructs: <http://www.atompub.org/rfc4287.html#date.constructs> Does not RFC 3339 defined date and time in a satisfactory manner to use directly in HTML5? If there's prior discussion regarding this, I'd really appreciate a pointer. Thanks! Without looking up prior discussion, the short answer is that content relies upon the parsing currently specified. Also, neither RFC3339 nor ISO8601 define parsing. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Minor error in content‐type sniffing t able
On 2 Jan 2009, at 22:39, 111...@gmail.com wrote: In section 2.7.4 of the specification, part of the table reads FF FF 00 00 FE FF 00 00 text/plain n/a UTF-16BE BOM FF FF 00 00 FF FF 00 00 text/plain n/a UTF-16LE BOM in the 1 January draft. Should this be FF FF 00 00 FE FF 00 00 text/plain n/a UTF-16BE BOM FF FF 00 00 FF FE 00 00 text/plain n/a UTF-16LE BOM ? Yes. -- Geoffrey Sneddon
Re: [whatwg] Spellchecking mark III
On 30 Dec 2008, at 11:38, Ian Hickson wrote: In 2006 I proposed the following spec for a spellcheck="" attribute, based on requests from the Google engineers then working on Firefox: http://www.damowmow.com/playground/spellcheck.txt The same engineers have since implemented this feature in Chrome also, and Google does use this attribute on its sites. However, the attribute has seen very little interest outside of Google, with just a handful of sites using it, primarily in dyanamic editor libraries. I have therefore not added this feature to HTML5 for the time being. If there is more interest in this feature, please speak up. This seems stupid. If I want to have spell-checking, let me. Don't force it off. I don't see any reason to have it forced off, ever. -- Geoffrey Sneddon <http://gsnedders.com/>
[whatwg] Resolving a URL
Hey, Time to send some feedback on the "resolve a URL" dfn. Step 3 is (currently) "If encoding is UTF-16, then change it to UTF-8.". Does this mean we literally change just "encoding" to UTF-16, and leave "url" verbatim, or are we meant to change "url" to UTF-8 too? This is currently ambiguous. Not changing "url" will cause issues later in a UTF-16 document. Step 12 replaces \ with /. IIRC WebKit does this for all URLs, not just those with a "server-based naming authority" (what's that anyway?). Also, earlier in the "Resolving URLs" section, there should probably be a ref. to XMLBASE. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] with omitted tags
On 26 Dec 2008, at 17:02, Calogero Alex Baldacchino wrote: Philip Taylor ha scritto: I can start with a simple document that's probably conforming and that the validator doesn't complain about: html> Then I can read the "Writing HTML document: Optional tags" section, which says: "A head element's end tag may be omitted if the head element is not immediately followed by a space character or a comment. A body element's start tag may be omitted if the first thing inside the body element is not a space character or a comment, except if the first thing inside the body element is a script or style element. A body element's end tag may be omitted if the body element is not immediately followed by a comment." So I choose to omit the because I think those rules say I can do so. I get: But now I get a parse error, which I think is because the comes in the "in head" insertion mode and is "Any other end tag: Parse error. Ignore the token.", so something seems wrong. AIUI, omitting those closing tags is a parse error anyway, but in certain situations the parser can fix the code automatically because the state to enter/remain in is unambigous. Thus a validator notifies a parse error, while a browser keeps the error internally and handles it when possible. The writing HTML documents section is meant to give what is a conforming HTML document, and those documents are conforming according to that. However, conformance checkers which are meant to follow the parser section (and throw the parse errors that produces) which in these cases differs. Therefore, either the writing section is wrong or the parser is wrong to throw the parse errors. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Byte-wise tokenization algorithm
On 21 Dec 2008, at 16:35, Edward Z. Yang wrote: I suppose the big pivot point is "as if". A byte-wise implementation would replace character globally with byte, and any U+ designation with the UTF-8 encoded byte version. HTML 5 dictates end behavior, not the actual algorithm implementation, no? It states that what is done must be wholly equivalent to the given algorithm. But an HTML5 implementation, according to the spec, must at a minimum support the UTF-8 and Windows-1252 encodings, so the overall implementation might not depending on exactly how this is done. The plan is to convert Windows-1252 into UTF-8 before processing; with a reasonably good iconv implementation, support for lots of encodings is possible. The implementation might not be fully conforming if iconv doesn't perform the proper (possibly context-sensitive; I haven't checked) substitution when it doesn't recognize a character, but it should be close. I've never seen any way of getting iconv (at least via PHP) to do what HTML 5 requires (i.e., replacing invalid bytes with U+FFFD). It is, however, possible using mbstring (which also has the advantage of not being system dependant), as well as with PHP6's Unicode support. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Byte-wise tokenization algorithm
On 21 Dec 2008, at 05:41, Ian Hickson wrote: 1. Given an input stream that is known to be valid UTF-8, is it possible to implement the tokenization algorithm with byte-wise operations only? I think it's possible, since all of the character matching parts of the algorithm map to characters in ASCII space. Yes. (At least, that's the intent; if you find anything that contradicts that, please let me know.) Indeed it is possible (or at least it certainly was a year and a half ago, but I have seen nothing change that would stop it). 2. Would such an implementation be conforming? Looking just at parsing, yes, probably... But an HTML5 implementation, according to the spec, must at a minimum support the UTF-8 and Windows-1252 encodings, so the overall implementation might not depending on exactly how this is done. That should be no problem: just convert Windows-1252 to UTF-8 using strtr() (as it is a SBCS this is simple enough — doing the inverse is not) — see the attached file. Then all you need to do is normalize the character set name to match all aliases of Windows-1252 and UTF-8, as well as mapping ISO-8859-1 and US-ASCII (and all their aliases) to Windows-1252. <http://bugs.simplepie.org/repositories/entry/sp1/trunk/create.php > does that (the only dependancy is for getting the file via HTTP, that can just be replaced with cURL if you wish to just require that). -- Geoffrey Sneddon <http://gsnedders.com/> http://www.opensource.org/licenses/bsd-license.php BSD License * @param string $string Windows-1252 encoded string * @return string UTF-8 encoded string */ function windows_1252_to_utf8($string) { static $convert_table = array( "\x80" => "\xE2\x82\xAC", "\x81" => "\xEF\xBF\xBD", "\x82" => "\xE2\x80\x9A", "\x83" => "\xC6\x92", "\x84" => "\xE2\x80\x9E", "\x85" => "\xE2\x80\xA6", "\x86" => "\xE2\x80\xA0", "\x87" => "\xE2\x80\xA1", "\x88" => "\xCB\x86", "\x89" => "\xE2\x80\xB0", "\x8A" => "\xC5\xA0", "\x8B" => "\xE2\x80\xB9", "\x8C" => "\xC5\x92", "\x8D" => "\xEF\xBF\xBD", "\x8E" => "\xC5\xBD", "\x8F" => "\xEF\xBF\xBD", "\x90" => "\xEF\xBF\xBD", "\x91" => "\xE2\x80\x98", "\x92" => "\xE2\x80\x99", "\x93" => "\xE2\x80\x9C", "\x94" => "\xE2\x80\x9D", "\x95" => "\xE2\x80\xA2", "\x96" => "\xE2\x80\x93", "\x97" => "\xE2\x80\x94", "\x98" => "\xCB\x9C", "\x99" => "\xE2\x84\xA2", "\x9A" => "\xC5\xA1", "\x9B" => "\xE2\x80\xBA", "\x9C" => "\xC5\x93", "\x9D" => "\xEF\xBF\xBD", "\x9E" => "\xC5\xBE", "\x9F" => "\xC5\xB8", "\xA0" => "\xC2\xA0", "\xA1" => "\xC2\xA1", "\xA2" => "\xC2\xA2", "\xA3" => "\xC2\xA3", "\xA4" => "\xC2\xA4", "\xA5" => "\xC2\xA5", "\xA6" => "\xC2\xA6", "\xA7" => "\xC2\xA7", "\xA8" => "\xC2\xA8", "\xA9" => "\xC2\xA9", "\xAA" => "\xC2\xAA", "\xAB" => "\xC2\xAB", "\xAC" => "\xC2\xAC", "\xAD" => "\xC2\xAD", "\xAE" => "\xC2\xAE", "\xAF" => "\xC2\xAF", "\xB0" => "\xC2\xB0", "\xB1" => "\xC2\xB1", "\xB2" => "\xC2\xB2", "\xB3" => "\xC2\xB3", "\xB4" => "\xC2\xB4", "\xB5" => "\xC2\xB5", "\xB6" => "\xC2\xB6", "\xB7" => "\xC2\xB7", "\xB8" => "\xC2\xB8", "\xB9" => "\xC2\xB9", "\xBA" => "\xC2\xBA", "\xBB" => "\xC2\xB
Re: [whatwg] Stability of tokenizing/dom algorithms
On 14 Dec 2008, at 21:55, Edward Z. Yang wrote: Are there any specific differences that pose problems? Not that I know of yet, since I haven't started on an implementation yet. Which brings me back to my original question: how stable is section 8? I would rather not be chasing a moving target. It's not really a moving target — what it is is largely constrained by the requirement to parse pre-existing documents (which rely on almost every possible bit of behaviour). If you do start work on a PHP implementation, please do seriously consider adding it to the html5lib project (which currently contains Python and Ruby implementations) as MIT licensed — there are also a fair number of test cases there. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Question regarding accessibility for img
On 30 Nov 2008, at 16:40, Pentasis wrote: I notice that it says in the spec under the img-section: "There has been some suggestion that the longdesc attribute from HTML4, or some other mechanism that is more powerful than alt="", should be included. This has not yet been considered." May I ask why it has not been considered (yet)? Because there's an issues list of several thousand issues, and as such not all issues have been considered. If we could do everything at once we'd have a spec instantly. :) -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Multi-block dicta within DIALOG
On 23 Nov 2008, at 20:11, Benjamin Hawkes-Lewis wrote: I'm wondering whether: Jack White foobarbazquux is equivalent to or different to: Jack White foobarbazquux Semantically equivalent, though different in the trees they produce (in the former "foobar" is a text node child of the dd element, in the latter it is a text node child of the first p element child of the dd element). and whether Jack White foobar is equivalent to or different to: Jack White foobar Same — semantically equivalent, though different in the trees they produce (in the former "foobar" is a text node child of the dd element, in the latter it is a text node child of the p element child of the dd element). Does DD have an implicit P, much as it has an implicit Q/BLOCKQUOTE? No: a run of text nodes (and phrasing content elements) is a paragraph, much like an explicit one created by the p element. <http://www.whatwg.org/specs/web-apps/current-work/#paragraphs > details this in-depth. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] [rest-discuss] HTML5 and RESTful HTTP in browsers
On 18 Nov 2008, at 16:41, Joshua Cranmer wrote: (and if you retort XMLHTTPRequest, let me point out that I personally would have objected to injecting HTTP specifics into that interface, had I been around during the design phases) XMLHttpRequest doesn't need to be XML, it doesn't need to be HTTP (FTP should work fine too in browsers IIRC), so all it really is is a generic request object. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] image element
On 30 Jul 2008, at 08:17, Nicholas Shanks wrote: So again, I ask for an element to replace . Benefits include: - As would cater for video/* MIME types, would cater for image/* I don't see how this is a benefit over . In order of importance to me: 1. It's spelt correctly. 2. It's not an empty element. 3. It's spelt correctly. Re: 2) — it is, and it has to be for backwards compatibility (it is changed to an img element in the tokenizer, though). In terms of 1 and 3, how about starting with something that is completely wrong, not just an abbreviation, such as the Referer header in HTTP? Not that that can actually be changed, because things rely upon it… -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Pre, code and semantics in HTML5: Wishful thinking?
On 22 Jun 2008, at 21:22, Edward Z. Yang wrote: To represent a block of computer code, the pre element can be used with a code element; to represent a block of computer output the pre element can be used with a samp element. Similarly, the kbd element can be used within a pre element to indicate text that the user is to enter. The implication is that document authors are recommended to use to wrap all of their programming code instead of a lone , if they wish to be fully semantic. This feels needlessly verbose and abusive of , which traditionally has been used to mark single-liners. Well, that tradition is wrong under HTML 4.01 (pre "tells visual user agents that the enclosed text is 'preformatted'", whereas code 'designates a fragment of computer code'). It also makes it extremely difficult to style pre as a block for code, as the only semantic indication that the contents of the pre block are computer code is its child. You'd end up having to say if you wanted to style pre as well. There are lots of thing that are semantically desirable in HTML that can't be fulfilled using pre-existing CSS selectors. Continuing to style "pre" is no less ambiguous and risky as it was under what the traditional behaviour is. At the same time, I still think the semantics of whether or not a tag indicates a plaintext file, or a piece of ASCII art, or computer code, is somewhat important. However, I think this information would be more appropriately given as an attribute. Why go against what HTML 4.01 does? It seems needless to change. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Creating An Outline oddity
On 15 Jun 2008, at 04:06, Ian Hickson wrote: On Sun, 15 Jun 2008, Geoffrey Sneddon wrote: Having implemented the creating an outline algorithm (see <http://pastebin.ca/1048202>), I'm getting some odd results (the only TODO won't affect HTML 4.01 documents such as the following issues). Using `FooBarLol`, and looking at the final "current section" (this is the root sectioning element, body), it seems I correctly get the heading of it ("Foo"), but I only get one subsection: "Bar". As far as I can see, my implementation follows what the spec says, so it looks as if this is an issue with the spec. With HTML 5, the current_outlinee at the end is a td element, when it should be the body element. That really is rather odd. I don't understand the markup you mean. Could you draw the DOM or provide unambiguous markup for what you're describing? (I don't understand how "Foo" is a heading but "Bar" is a section in your markup.) The first issue is identical to <http://lists.w3.org/Archives/Public/public-html/2008Mar/0032.html >, which I bullied (sorry, asked) you in to fixing yesterday and is now fixed. The second issue was an implementation bug. -- Geoffrey Sneddon <http://gsnedders.com/>
[whatwg] Creating An Outline oddity
Having implemented the creating an outline algorithm (see <http://pastebin.ca/1048202 >), I'm getting some odd results (the only TODO won't affect HTML 4.01 documents such as the following issues). Using `FooBarLol`, and looking at the final "current section" (this is the root sectioning element, body), it seems I correctly get the heading of it ("Foo"), but I only get one subsection: "Bar". As far as I can see, my implementation follows what the spec says, so it looks as if this is an issue with the spec. With HTML 5, the current_outlinee at the end is a td element, when it should be the body element. That really is rather odd. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Video
On 2 Apr 2008, at 16:55, Robert J Crisler wrote: It will be very, very difficult to develop critical mass for content encoded in Theora (or Dirac), much less ubiquity. I'm not saying there's no point in trying. I applaud the effort, though I have misgivings about the W3C setting itself up as a video/audio standards organization when we already have the Motion Picture Experts Group. I don't think anyone whatsoever is suggestion to create a new codec — we'd gain nothing by doing so. But ... why not recommend that web developers encode in MPEG-4 AVC or Theora? MPEG-4 has patent fees to be paid, making it impossible for Firefox or Konqueror (for example) to comply to that. Theora has unknown patent status, and big companies are unwilling to implement it (as it has little pre-existing content, and it is no better than what they already have) lest they get sued due to some submarine patent. At least that would give some direction out of the current morass. ISO/IEC standards, like AVC/h.264, are vastly preferable to single- vendor (non)standards from Adobe, MS and Real. All the codecs that have publicly been looked at already have glaring issues with actually getting them interoperably used. We need something everyone is willing to implement. If people don't implement what we say, what we say is irrelevant. Why should the W3C choose not create a better situation than the current one (which is a mess for developers and a mess for users), while continuing to work on the ideal? There's a reason why the status quo is the status quo: different people willing to implement different things. One standard cannot force people to implement something they don't want to. We cannot just create a better situation: people have to actually do what we say to be in any better situation than we already are. One group can't implement specifications with known patents, and the other is unwilling to implement specifications with no known patents, due to submarine patent risks. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] several messages about content sniffing in HTML
On 29 Feb 2008, at 16:33, Julian Reschke wrote: Geoffrey Sneddon wrote: It seems like the HTTP spec should define how to handle that, but the HTTP working group has indicated a desire to not specify error handling behaviour, so I guess it's up to us. IE and Safari use the first one, Firefox and Opera use the last one. I guess we'll use the first one. Isn't the fact that FF and IE disagree here an indication that this doesn't need to be specified? Things aren't specified well enough until I can write an HTTP UA that can work in the real world (which, as someone dealing with feeds, I can tell you need without question support for content- type sniffing) from reading specifications without having to reverse-engineer anything. ... Doesn't seem to apply to this case. A duplicate Content-Type header response indicates that the response is invalid. And guess what? Users don't like error messages. I want to know how to deal with it without having to look elsewhere (from the spec). Apparently, most browsers accept the response anyway, some of which picking the first value, others the second. Both behaviors seem to be acceptable to users. So there's nothing you *need* to reverse engineer in this case. A page (<http://www.toledoblade.com/apps/pbcs.dll/section?Category=RSS01&mime=XML >) that I came across recently had: Content-Type: XML Content-Type: text/XML Using the first would break badly. I guess it seems to work because of content-type sniffing on an unknown (and invalid) header (or, as many feed readers do, totally ignoring it, with the exception of any charset parameter). Without content-type sniffing, that HTML 5 now allows, you need the last. But as James says: how do I know that which behaviour I choose doesn't matter until I reverse engineer browsers to discover that? -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Proposal for a link attribute to replace
On 29 Feb 2008, at 01:29, Shannon wrote: Geoffrey Sneddon wrote: > While yes, you could rely on something like that, it totally breaks in any user agent without scripting support. Nothing else, to my knowledge, in HTML 5 leads to total loss of functionality without JavaScript whatsoever. By total loss of functionality I meant something that is functionality provided by HTML itself (and not through CSS or some DOM API) which leads to the page being totally unusable. Well nothing except global/session/database storage, You already have the fallback for people without ECMAScript, so that works fine. the "irrelevant" attribute, So you can edit something which you otherwise couldn't. Oh well. Nothing breaks. contenteditable, Oh come on. Even IE supports this. This most certainly is backwards compatible. contextmenu, Again, this is a DOM API and can be recreated in ECMAScript (which, if you're try to use it at all, you know is enabled). draggable, Both IE and Safari have partial support for this already. the video and audio elements, canvas All three of these have fallback content, which is needed sometimes when a browser does support HTML 5 anyway. and the connection interface. Again, you know you have ECMAScript enabled already to be able to use this at all. Something similar could be done using XMLHttpRequest, if I am not mistaken. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] several messages about handling encodings in HTML
On 29 Feb 2008, at 01:21, Ian Hickson wrote: - Again there, shouldn't we be given unicode codepoints for that (as it'll be a unicode string)? Not sure what you mean. This is just me being incredibly dumb. Ignore it. On Sat, 26 May 2007, Henri Sivonen wrote: The draft says: "A leading U+FEFF BYTE ORDER MARK (BOM) must be dropped if present." That's reasonable for UTF-8 when the encoding has been established by other means. However, when the encoding is UTF-16LE or UTF-16BE (i.e. supposed to be signatureless), do we really want to drop the BOM silently? Shouldn't it count as a character that is in error? Do the UTF-16LE and UTF-16BE specs make a leading BOM an error? If yes, then we don't have to say anything, it's already an error. If not, what's the advantage of complaining about the BOM in this case? I don't see anything making a BOM illegal in UTF-16LE/UTF-16BE, in fact, the only mention I find of it with regards to either in Unicode 5.0 is "In UTF-16(BE|LE), an initial byte sequence <(FE FF|FF FE)> is interpreted as U+FEFF zero width no-break space." I suppose the rational given for removing it is the section that follows D101 (e.g., "When converting between different encoding schemes…UTF-8 byte sequences is not recommended by the Unicode Standard."). -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Proposal for a link attribute to replace
On 28 Feb 2008, at 12:18, Shannon wrote: So 'backwards-compatibility', as defined by the same document, can be achieved by using javascript to walk the DOM and add 'window.location(node.getAttribute('link'))' to the onclick handler of any nodes with a link attribute. I have done a very similar thing before to implement :hover on non-anchor elements in IE. Of course an author wouldn't have to use this new attribute at all so backwards-compatibility is the designers choice, not an issue with the proposed attribute. While yes, you could rely on something like that, it totally breaks in any user agent without scripting support. Nothing else, to my knowledge, in HTML 5 leads to total loss of functionality without JavaScript whatsoever. Nothing else reinvents the wheel for something with which we already have a perfectly fine solution already. You mention in the first email in this thread that this would allow nested hyperlinks: altering the content model of the a element would allow this too. I don't see it as being a very sensible thing to allow, though. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Clarification on hashed id reference
On 20 Feb 2008, at 19:47, Adele Peterson wrote: I was looking at the definition of a valid hashed id reference, and I noticed some inconsistency. The first sentence says the string must match the id attribute, but then the last parsing rule says that the string can match the id or name attributes of the element. If the parsing rule is correct, then should there be some rule for determining which attribute should get checked first? It already says "[r]eturn the first element" — which attribute gets checked first is irrelevant. If you search by attribute, I guess you need to carry out both searches, then combine the results, order by tree order, and return the first. And if the parsing rule is correct, maybe the initial description should mention the name attribute too. It means exactly what it says: conformant documents cannot use @name, but parsers must look in @name. They serve identical purposes, so there's no reason to allow both in a document, but parsers must support both for compatibility. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Some questions
On 31 Jan 2008, at 17:50, Charles wrote: If it's that the SWF references a FLV, QuickTime Movies have been able to reference media pretty much forever, and when you embed an ASX with references with Windows Media content, you're still embedding video even though the metafile happens to be a text file. Whereas it is possible to get the video from a QuickTime container, it is not possible to get a FLV from a SWF, making it impossible to directly control the video. The video element exists to contain container formats (of which Flash is not one, though FLV is), and nothing else. Inserting a Flash file into a video element is similar to inserting an HTML file that happens to have a link to video: sure, it links to a video, but it does a billion other things too — it isn't in itself the video. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Some questions
On 28 Jan 2008, at 23:32, Charles wrote: The element offers an interface to the native media playback capabilities of the platform. The browser platform (e.g. WebKit), the multimedia platform (e.g. QuickTime) or the OS platform (e.g. Mac OS X)? Whatever the browser chooses to use. In WebKit's case, this is the OS (so QT on OS X, DirectShow on Windows, and GStreamer on GTK). Presto (in Opera) provides its own decoder (for Ogg/Vorbis/Theora, likewise does Gecko. It is not a plug-in mechanism and it is not suitable for embedding things like Flash or Silverlight. So for Safari on both Macintosh and Windows, is Apple's intent that will only work for formats supported by QuickTime? Apple's intent, as far as I'm aware, is to use the natively supported multimedia support of a given environment (as WebKit isn't for multimedia). Also, as Henri has already said, QuickTime supports plugins itself. And given that little internet content targets QuickTime, who exactly will be using the tag? There is a _huge_ amount of content on the web that uses MPEG-4, which QuickTime supports (note that on Windows DirectShow doesn't support MPEG-4 out of the box, and AFAIK only supports MPEG-1 and WMV (for video)). There's also still a large amount of content that relies on the QuickTime container format (.mov), even if the content is MPEG-4 (whose own container is based on the QT one). -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] XHTML subtitle (was: [html5] r1156 - /)
On 14 Jan 2008, at 05:45, ianh wrote: Add a subtitle to clarify the scope of the document for people who don't read the spec. (W3C version only.) Is there any reason for this not to be in the WHATWG version as well? -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] HTML5 and URI Templates
On 16 Dec 2007, at 14:12, Julian Reschke wrote: Henri Sivonen wrote: On Dec 16, 2007, at 05:28, James M Snell wrote: The gist of the idea (which I believe may have been brought up before but I'm not certain) is to allow the use of a URI Template in place of the form element action attribute, and to use form elements to provide the replacement values, e.g. http://example.org{-prefix|/|foo}?bar={bar}" method="POST"> Foo: Bar: What's the backward-compatibility story of this feature? (Both behavior of URI templates in legacy browsers and ensuring that existing content doesn't use braces.) Braces are not allowed in URIs (in case somebody forgot :-). That's exactly why URI Templates can use them. There are sites that rely on braces in URIs. You can't just go and change their meaning, breaking the sites, specs be damned. If RFC 3986 defined what to do with non-conformant URIs, we wouldn't have this issue. -- Geoffrey Sneddon <http://gsnedders.com/>
[whatwg] +/- in SGML DOCTYPE (was: Re: The truth about Nokias claims)
On 15 Dec 2007, at 12:52, Benjamin Hawkes-Lewis wrote: Krzysztof Żelechowski wrote: Dnia 14-12-2007, Pt o godzinie 19:47 +0100, Maik Merten pisze: Krzysztof Żelechowski schrieb: Remember the "-" in DOCTYPE HTML? Feel free to be more specific. That prefix means that HTML DOCTYPE is not issued by an officially recognised standards body. If W3C were such an organisation, we would have a "+" there instead. I haven't bought the SGML specification to double-check, so feel free to quote from it if it says otherwise. But from everything else I've read it simply means W3C has not registered a Public Text Owner Identifier with ISO. See also: http://msdn2.microsoft.com/en-us/library/ms535242.aspx http://www.is-thought.co.uk/book/sgml-6.htm#FPI http://www.freebsd.org/doc/en_US.ISO8859-1/books/fdp-primer/sgml-primer-doctype-declaration.html http://xml.coverpages.org/gca-pubidrls.html http://xml.coverpages.org/fpiResolverFlynn.html Any old organization can register as Public Text Owners, not just officially recognized standards body. The - has nothing to do to do with W3C being (or not being) recognized as a standards body. ISO 8879:1989 states that SGML public text owner identifier registration (i.e., those that start with a + instead of the unregistered -) is defined in ISO 9070, which I don't have a copy of. I can, however, quote the summary from ISO 8879:1989: "These [registered owner identifiers] include standards body identifiers for national or industry standards organisations (similar to the ISO owner identifier), and unique codes that may have been assigned to organisations by other standards". -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] The truth about Nokias claims
On 14 Dec 2007, at 07:15, Shannon wrote: Ian, as editor, was asked to do this. It was a reasonable request to reflect work in progress. He did not take unilateral action. Ok, not unilateral. How about 'behind closed doors?'. Why no open discussion BEFORE the change? Please look back on the mailing list archives. There's been plenty of discussion about this before, and it's always ended up in the same loop: A group of people wanting nothing but Ogg/Theora/Vorbis, and another wanting one standard that all major implementers will support. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Xiph.Org Statement Regarding the HTML5 Draft and the Ogg Codec Set
On 13 Dec 2007, at 21:29, Manuel Amador (Rudd-O) wrote: It would make no sense to require a different baseline audio codec for the non-sfx use case for than was required for .) Of course it would make sense. People use their Web pages to put up music all the time, much more often than for sound FX. They aren't gonna upload files tens of megabytes in size for each song! Huh? I think you've misunderstood what you're quoting, i.e., there will be two audio codecs supported for : - PCM in WAVE. - Whatever is chosen for . -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Ogg content on the Web
On 12 Dec 2007, at 19:30, Maik Merten wrote: Geoffrey Sneddon schrieb: Apart from those two, the others I can think of are those that are in excess of twenty years old (and therefore their patents have expired), such as H.260. I couldn't find anything insightful about "H.260". Sure you don't mean H.120, which is a 1982 video codec I couldn't find a current implementation of? Yeah. I always miscall it H.260 (as it is the precursor to H.261). H.261, OTOH, is a 1990 standard and thus still a bit away from getting absolutely free. Though, by the time we reach LC, it may not be. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Ogg content on the Web
On 12 Dec 2007, at 17:44, David Gerard wrote: On 12/12/2007, Geoffrey Sneddon <[EMAIL PROTECTED]> wrote: On 12 Dec 2007, at 14:23, David Gerard wrote: FWIW, Wikipedia and Wikimedia Commons only allow unencumbered formats on the site. Video MUST be Ogg Theora. Compressed audio better be Ogg. Why must video just one of many unencumbered formats? Er, what are the others? Technically speaking, Theora is actually unencumbered (it just has a RF license covering the patents from On2). Dirac is in a similar situation. Apart from those two, the others I can think of are those that are in excess of twenty years old (and therefore their patents have expired), such as H.260. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Ogg content on the Web
On 12 Dec 2007, at 14:23, David Gerard wrote: FWIW, Wikipedia and Wikimedia Commons only allow unencumbered formats on the site. Video MUST be Ogg Theora. Compressed audio better be Ogg. Why must video just one of many unencumbered formats? So far we have had zero patent trolls come calling. I wonder why that is. Do you have enough money to pay a fine a similar size to what MS got last year? If you don't have enough money, they won't sue you. It isn't worth their time. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Video codec requirements changed
On 12 Dec 2007, at 01:41, Maciej Stachowiak wrote: 1) maybe (I've heard game vendors cited, not sure which ones) I know someone already posted a list, but it is used within all Unreal Engine 2.5 (i.e., UT 2004) and Unreal Engine 3 (i.e., UT 3) games (which I'm sure you can find a long list of games that use them on Wikipedia or elsewhere). -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Video/Audio specs
On 12 Dec 2007, at 00:02, Fabien Meghazi wrote: Who is against this ? Audio ogg/vorbis must be supported Audio mp3 should be supported Video container ogg with H.264 codec & vorbis audio must be supported Video container ogg with H.264 codec & mp3 audio should be supported Anyone who distributes free (as in beer) software. You need to pay patent charges for the MPEG standards. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] OGG in HTML5
On 11 Dec 2007, at 20:40, Manuel Amador (Rudd-O) wrote: Yup, refreshing perspective. How about we go ENFORCE our free format instead of having proprietary companies ENFORCE their proprietary formats this time? It'd certainly be a change of zeitgeist Where is this enforcement of any proprietary format? The note in the spec absolutely forbids such a thing. And when has any version of HTML ever enforced a proprietary format (which is what I assume you mean by "this time")? -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Removal of Ogg is *preposterous*
On 11 Dec 2007, at 20:12, Manuel Amador (Rudd-O) wrote: It was intended as meaning "recognized" in the sense of browsers recognising them. No currently shipping browser recognises either Ogg Vorbis or FLAC. If I use EMBED on Konqueror pointing to an Ogg Vorbis file, I get a nice player with streaming and everything. Konqueror's shipping, isn't it? There is at least *one* browser that already supports, through GStreamer, Ogg in tags. I'd give you the link but it apparently fell off the end of Planet GNOME so I can't find it... Now hold on, it's not shipping, but that doesn't mean it won't be shipping tomorrow. What you actually wanted to say (but couldn't/didn't/were unwilling to) is: "No currently shipping browser by any of the major proprietary software vendors support Ogg Vorbis or FLAC". Nor any of the minor ones, nor most open source ones. Also, I assume through Konqueror relying on GStreamer that Konqueror doesn't support it itself (or through a required dependancy, which is needed to actually conform to such a clause that existed). WebKit trunk also supports Ogg in if you have the needed QT component (which is supporting it as much as Konqueror supports it). Opera 9.5 beta has built in support for Ogg/etc. and supports nothing else. There are still large questions about when Fx will support (which I assume from your later post is what you were referring to) natively, though it may well be in Fx 3.0 in early '08. It's just dollars. Apple does not license Apple Lossless to anyone else AFAIK, OK. So they sell fewer iPods because iPods don't play Ogg Vorbis without Rockbox. Same outcome. Oh, look, they are already losing custom through not supporting WMA. It doesn't look like they particularly care about that, does it? and the only standards that MPEG-LA collects money for that Apple receives any share of whatsoever is "MPEG-4 Systems" and IEEE 1394 (Firewire). Neither of these have anything to do with audio/video codecs. Saying that Apple has a financial interest in wanting MPEG codecs mandated in HTML 5 is totally untrue. I didn't say Apple wanted MPEG codecs mandated in HTML 5, so don't put words in my mouth or attempt to smoke-and-mirrors us with straw men. This is either a fumble on your part or an attempt to derail the discussion into wreckland. No, it is me trying to understand what you're meaning. I said Apple doesn't want Ogg Vorbis because they don't control the tech, and because they would very much rather have consumers "prefer" (in the sense of being screwed with no choice) DRM-encumbered AAC (note it's not the codec, but the controlling of the consumer that matters here). AAC doesn't support DRM natively. It's a proprietary extension. iTunes has always ripped CDs by default into non-DRM-encumbered AAC (i.e., an open standard, and compatible with numerous players). Apple has never, anywhere where it has a choice, favoured DRM-encumbered standards. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] OGG in HTML5
On 11 Dec 2007, at 20:21, SA Alfonso Baqueiro wrote: well, I think that ENFORCING is the way the real life works, is the case of windows, it is used because the hardware vendors enforce us to use it installing it by default, is the case of the mp3, is the case of any file format, is the case of swf, and is the case of wma, people use all that crap because they are enforced, I think is very positive that when we create something, if it is pretty useful we have the power to enforce is use, for the good, and is ok, HTML5 should enforce OGG as a supported format, IT IS GOOD FOR EVERYONE, only look at what happened with the PNG, is now wide accepted over GIF. PNG did not succeed by being enforced: it never was enforced. It succeeded purely on its technical merits. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Removal of Ogg is *preposterous*
On 11 Dec 2007, at 19:04, Manuel Amador (Rudd-O) wrote: You are right. My bad. Why don't we write in the spec? "Examples of widely recognized free-for-use audio formats are Ogg Vorbis and FLAC" It was intended as meaning "recognized" in the sense of browsers recognising them. No currently shipping browser recognises either Ogg Vorbis or FLAC. The answer to that question is that Apple and Nokia don't want us to use Ogg Vorbis because they sell their own, encumbered tech and we would be less likely to license (read: give them monopoly rents) their tech. The very MENTION of Ogg in the spec threatens their monopoly rents, and that's why they had it removed. It's just dollars. Apple does not license Apple Lossless to anyone else AFAIK, and the only standards that MPEG-LA collects money for that Apple receives any share of whatsoever is "MPEG-4 Systems" and IEEE 1394 (Firewire). Neither of these have anything to do with audio/video codecs. Saying that Apple has a financial interest in wanting MPEG codecs mandated in HTML 5 is totally untrue. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Removal of Ogg is *preposterous*
On 11 Dec 2007, at 18:15, Manuel Amador (Rudd-O) wrote: El Mar 11 Dic 2007, Dave Singer escribió: I'm sure that many people would be happy to see a mandate if someone were willing to offer an indemnity against risk here. You seem quite convinced there is no risk; are you willing to offer the indemnity? No. Unlike Apple, I don't have a huge patent portfolio. My patent count reaches the awesome number of *zero*. Would you be willing to offer patent indemnity to unlicensed users of your Apple AAC audio format? Because I fail to see why leaving users without a free choice for audio *helps* things. I dunno, maybe I'm just dumb as a rock. That's not what Dave is meaning: If Apple gets sued for patent infringement, will you pay however many billion USD they have to? If you truly believe there are no patents covering Ogg/etc. then you can safely agree knowing that you'll never have to give away any of your money. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] OGG in HTML5
On 11 Dec 2007, at 16:20, alex wrote: I am a webdeveloper and a fierce supporter of opensource. I was under the impression the standards were being designed in the same opensource spirit, but I may have been wrong. Standards are developed inline with the policies of the organisations they are developed by. <http://www.w3.org/Consortium/Process/> describes the W3C process document. The issue here is that the chairs think the reasons given for not publishing a working draft are strong enough (i.e., it is the strength of the arguments, not the number in favour of the arguments that is important). Setting OGG as the de facto standard is the best idea i've heard in a long time, How can you set a de-facto standard? By the very meaning of de-facto, you cannot. We can set a de-jure standard, but not a de-facto one. and now it's all coming down because a few companies (some of which are known for their vendor lock-in tactics) want to keep their empire. No, it is coming down because a few companies don't want to take the risk of being sued for submarine patents which might exist for Ogg/ Vorbis/Theora. Do you want to pick up the bill for patent infringement? MS has to pay 1.52 billion USD for (submarine) patent infringement covering MP3. Unsurprisingly, major companies don't want to take such a risk on a codec that has few advantages over current standards such as MPEG-4. But why, then, are they happy to support MPEG standards? They already do: it had/has clear technical advantages to prior de-facto formats (the same cannot be said for Theora, which is less efficient than MPEG-4). They have already taken the risk to support it, and people have already had the chance to sue them, and that has not yet happened. In the case of MS and Apple, they already support video formats at the OS level, and don't re-implement them within the browser (and have already therefore paid patent charges). Finally, the risk of supporting both is greater than supporting just one. There are already widespread de-facto standards, so that is what they will choose to support, not a container/codec combination that has (comparatively) very little content. I am not saying that ogg should be enforced onto anyone, if nokia wishes to keep using a different format, no problem, but by making it a standard, we at least know that ogg will be supported by all (standards-compatible) browsers, and as such it can be deployed by those who are opposed to vendor lock-in or monopoly positions. It won't be supported by all (currently) standards-compatible browsers. Apple, a major browser vendor, has said they don't intend to implement Ogg/Vorbis/Theora just because the spec requires it (i.e., if you can get a critical mass of web content using it, you may well be able to get them to support it). OGG is the choice of freedom, enabling that freedom for all webdevelopers is a must in my opinion, although in the same spirit, it can not be enforced upon anyone, therefor the original text stating it "should" instead of it "must" is probably the best way to go. If it is a MUST, then the spec is irrelevant: it will be ignored by major companies. We must settle at a compromise between the two POVs to get the spec implemented at all; we otherwise run the risk of major companies not implementing any part of the spec whatsoever, leaving us far worse off that we would be otherwise. Also, if it a MUST everyone in the WG would be issuing a RF license covering any patents they hold covering Ogg/Vorbis/Theora to everyone else in the WG (as per <http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential >), which companies such as MS and Nokia have said they are unwilling to do. As far as compromises go, there are several viable solutions, including MJPEG and H.261 (the latter is only slightly worse than Theora, and is so old (as of next year, even the revision to it will be 20 years old) that any and all patents have either expired or are invalid). This still leaves questions open regarding container format and audio (which I know less about, and won't comment so much on). If you truly do want make no compromises yourself, you may be able to get the major browser manufacturers that are currently unwilling to implement Ogg/Vorbis/Theora to implement them by getting a critical mass of content out there. Bear in mind, though, that MS still does not support MPEG-4 out of the box (except for Zune), despite the huge amount of MPEG-4 content already out there. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Removal of Ogg is *preposterous*
On 11 Dec 2007, at 18:09, Manuel Amador (Rudd-O) wrote: Fact: Vorbis is the *only* codec whose patent status has been widely researched, nearly to exhaustion. Repeating the same FUD over and over again (which you just did) may lead the world to believe this to be false, but it's TRUE. You should at least have talked to Monty @ Xiph before jumping to rash conclusions. So undisclosed patents have been looked at? How? -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Removal off Ogg technology
On 11 Dec 2007, at 15:33, Wilson Michaels wrote: In reference to: http://html5.org/tools/web-apps-tracker?from=1142&to=1143 I am a retired software developer who is outraged that Ogg technology has been removed from HTML5. It must be reinstated as a "should" option so that the world is not held hostage to proprietary implementations of media technologies. Proprietary technologies eventually are used to limit inovation and prevent entry of other thechnologies that threaten the proprietary company in some way. We don't need another MP3 fiasco. What difference is there between a SHOULD that few, if any, major companies implement, and one that doesn't exist? The spec will never recommend any format that cannot be freely (as in beer) be implemented safely by developers (i.e., without risking being sued). Also, MP3 is not a proprietary standard: you can go out and buy a copy of the spec if you wish, and pay any patent charges due. You still, as with anything invented within the last 20 years (including Ogg/Vorbis/ Theora), run the risk of a submarine patents. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] Removal of Ogg is *preposterous*
On 11 Dec 2007, at 13:36, Maik Merten wrote: The old wording was a SHOULD requirement. No MUST. If the big players don't want to take the perceived risk (their decision) they'd still be 100% within the spec. Thus I fail to see why there was need for action. There's a question within the W3C Process whether patents that are covered by a SHOULD via a reference are granted a RF license similarly to anything that MUST be implemented. Both Nokia and MS raised concerns about this relating to publishing the spec as a FPWD. -- Geoffrey Sneddon <http://gsnedders.com/>
Re: [whatwg] several messages about a way to disable referer headers for links
On 4 Nov 2007, at 12:40, Anne van Kesteren wrote: On Sat, 03 Nov 2007 18:27:50 +0100, Krzysztof ??elechowski <[EMAIL PROTECTED] > wrote: Dnia 03-11-2007, sob o godzinie 08:42 +, Ian Hickson napisa??(a): Ok, I've added a rel value similar to "nofollow" called "noreferer" that does this. While we are unable correct the spelling of "referer", we certainly need not duplicate it for "noreferrer". There must be some end to this self-humiliation. I think it's way better to stay consistent. Especially as the feature affects the Referer (sic) header. I too think Anne is right here — there are enough things that are inconsistent in the web already. Don't add another thing that requires me to think. I'll just make mistakes. A markup language should not require me to think — it should reflect logical structure. Importantly, outwith the structure, logic dictates contextual consistency (even if that goes against being consistent with other contexts). -- Geoffrey Sneddon <http://gsnedders.com/>
[whatwg] RFC 2732 reference unneeded
#terminology: For readability, the term URI is used to refer to both ASCII URIs and Unicode IRIs, as those terms are defined by RFC 3986 and RFC 3987 respectively, and as modified by RFC 2732. RFC 2732 is irrelevant, as URIs as of RFC 3986 and IRIs as of RFC 3987 define how to deal with IPv6 addresses. RFC 2732 is noted as obsoleted by RFC 3986. - Geoffrey Sneddon
Re: [whatwg] Web forms 2, input type suggestions
On 15 Jul 2007, at 03:08, Andrew Fedoniouk wrote: - Original Message - From: "Sander" <[EMAIL PROTECTED]> To: "Martin Atkins" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Saturday, July 14, 2007 12:08 PM Subject: Re: [whatwg] Web forms 2, input type suggestions Martin Atkins schreef: Benjamin Joffe wrote: Have the following possible values for the TYPE attribute been considered for the INPUT element? type="color" The user agent would display an appropriate colour picker and would send a hexidecimal string represting that colour to the server. I like this idea. It's simple and it's something I've implemented (and seen implemented) dozens of times. I like this one too. It should have an pallet attribute that defines the color pallet. I'm not shure how though, cause on one hand I'd like to be able to choose easily from standard pallets, but on the other hand I'd like the option to create custom pallets. Perhaps pallet="custom" combined with a datalist could be an option here. ... There are many possible implementations for different purposes. Here is one of color selectors we use in HTML: http://www.terrainformatica.com/sciter/screenshots/color-chooser.png I think it is not realistic to define all of them in single specification - too many different use cases. I would define some generic extensible mechanism for inputs rather than defining particular input type=foo. I expect the majority of UAs would use their OS's colour picker (or, failing that, the de-facto standard on the OS). In the case of OS X, this allows entry in many diverse ways: a colour wheel + brightness, grey scale, RGB, CMYK, HSB, spectrum, as well as a selection of predefined colours. Why do you only need to allow specific colours anyway? - Geoffrey Sneddon
[whatwg] Implementation + Test Cases Available For Numbers Subsection of Common Microsyntaxes
Now my review of this subsection is complete, it is now worthwhile publicising my 1:1 PHP implementation of the HTML 5 algorithms, including the numbers subsection at <http://geoffers.no-ip.com/svn/ php-html-5-direct/src/trunk/numbers.php>. There are also test cases (that follow the spec even when there are issues with it) at geoffers.no-ip.com/svn/php-html-5-direct/tests/numbersTest>. Results for currently shipping UAs (esp. browsers) would be greatly welcomed. - Geoffrey Sneddon
Re: [whatwg] The issue of interoperability of the element
On 26 Jun 2007, at 17:46, Maik Merten wrote: * The spec can be practical about implementing the tag and specify H.263 or MPEG4 as a baseline. Existing multimedia toolkits can be reused in implementation and thus all browsers can support the standard. Users will use the format thanks to ubiquitous support. The "tax" will be a non-issue in most cases despite leaving a bad taste in the standard committee's mouth. Up and coming browsers can choose not to implement that part of the standard if they so choose or piggyback on an existing media player's licensing. Free Software like Mozilla cannot implement MPEG4 or H.263 and still stay free. The "tax" *is* an issue because you can't buy a "community license" that is valid for all uses. Plus even if you implement H.263 or MPEG4 video - what audio codec should be used with that? Creating valid MPEG streams would mean using a MPEG audio codec - that'd be e.g. MP3 or AAC. Additional licensing costs and additional un-freeness. Don't get me wrong: MPEG technology is nice and well performing - but the licensing makes implementations in free software impossible (or at least prevents distribution in e.g. Europe or North America). Under the current spec it is merely a "SHOULD" — you can have an implementation of the spec that omits it. MPEG4 and WMV are the current de-facto standards. We should really just pave the cowpaths here, meaning those are the real two options. WMV has absolutely no publicly available documentation, so it makes no sense to reference that. MPEG4 has publicly available documentation, but is patent- encumbered. MPEG4 looks better on grounds that it is at least implementable by people outside of MS without reverse engineering it themselves. - Geoffrey Sneddon
Re: [whatwg] The issue of interoperability of the element
On 26 Jun 2007, at 00:57, Silvia Pfeiffer wrote: So a company which owns a patent on a standard that can bought and read at freedom is just as bad as a company which owns a patent on a standard that has absolutely no public documentation? If you're talking about Ogg Theora, then you've got your facts wrong. First of all, Ogg Theora is not owned by a company. "So a company [Apple] which owns a patent on a standard that can bought and read at freedom [MPEG4] is just as bad as a company [Microsoft] which owns a patent on a standard that has absolutely no public documentation [WMA/WMV]?" - Geoffrey Sneddon
Re: [whatwg] The issue of interoperability of the element
On 25 Jun 2007, at 13:21, Ivo Emanuel Gonçalves wrote: According to Wikipedia, "AT&T is trying to sue companies such as Apple Inc. over alleged MPEG-4 patent infringement.[1][2][3]" I would be fascinated to see a statement from Apple, Inc. regarding this. Seeming they are already under risk from what they already support, what advantage do Apple get by supporting more codecs, therefore opening up themselves to further risks? It's also quite interesting that different portions of MPEG-4, including different sections of video and audio are licensed separately, so what this means is that any vendor willing to support MPEG-4 for and has to locate every patent holder and pay them. No, they don't, it all goes through MPEG-LA. Oh, and will you look at this, Apple, Inc. holds one the patents! US 6,134,243 [4]. So Apple gets money for every single license sold. How nice. They are attempting to lock vendors into MPEG-4 and get money from licenses in the process. Apple, Inc. is no better than Microsoft. So a company which owns a patent on a standard that can bought and read at freedom is just as bad as a company which owns a patent on a standard that has absolutely no public documentation? Also, a large part of this topic has been around H.264, Apple holds no known patents affecting H.264. - Geoffrey Sneddon
Re: [whatwg] Drop UTF-32
On 15 May 2007, at 10:35, Michael Day wrote: Please, drop UTF-32 and save implementors from worrying about it when no one uses it and no one should use it. Including it in a few encoding detection algorithms is no big deal on us implementers: as the spec stands we aren't required to support it anyway. All the spec requires is that we include it within our encoding detections (so, if we don't support it, we can then reject it). - Geoffrey Sneddon
Re: [whatwg] additional empty elements
On 1 May 2007, at 20:21, Brenton Strine wrote: However, if I then wanted to add additional special styling to the first and third div, (e.g.. a border and background color) it is less graceful. I could add style attributes, but that would be wasteful if I want to do this on a large scale. Multiple classes would be confusing. A nice solution would be the addition of a few div tags. (e.g. , , and .) Then you could do something like this: div1 {text-indent:0px;} div2 {text-indent:10px;} div3 {text-indent:20px;} Why not: .first { color: red; } .first + div { text-indent: 10px; } .first + div + div { text-indent: 20px; color: blue; } Indent 0 Indent 1 Indent 2 Indent 0 Indent 1 Indent 2
[whatwg] IE/Win treats backslashes in path as forward slashes
Looking through the spec again, there is nothing about backslashes in URI's path being treated as a forward slash, behaviour needed for compatibility for quite a few websites. - Geoffrey Sneddon
Re: [whatwg] element feedback
On 23 Mar 2007, at 03:15, liorean wrote: On 23/03/07, Sander Tekelenburg <[EMAIL PROTECTED]> wrote: While that might be useful, it's not at all obvious to me that it is a *requirement*. What is so wrong with fetching the entire file, and start playing it at the point referenced by the fragment identifier? That's how fragment identifiers work for textual resources (and they fetch the usual truckload of images along with the HTML file). Well, it would be nice to not have to download an hour long lecture to see the 30 second interval of interest starting at at 47:26... However, as I understand the Ogg Theora format, it contains essential data for decoding in the start of the file, so unless the server has some format specific knowledge and handling the client must either have already gotten that information somehow, or must request the entire file. I have no idea whether the other codecs I've heard discussed (Dirac and H.264) have a similar issue or not. That sort of info is held within the container, so everything within Ogg (so both Theora and Dirac) will suffer from it. H.264 being part of the MPEG-4 standard follows what Kevin Marks said: On 24 Mar 2007, at 08:57, Kevin Marks wrote: 2. define a chunk/offset table that maps media to time, and look this up ahead of any seeking. (this is the QT approach, and that of MPEG4 - Geoffrey Sneddon
Re: [whatwg] Video proposals
On 16 Mar 2007, at 23:58, Håkon Wium Lie wrote: Also sprach Robert Brodrecht: I'd rather make and optional so that those who cannot support these Ogg on these elements (for whatever reason) can still comply with the spec. They can also support proprietary codecs through . Do you mean make the elements themselves optional to support? Yes. If a vendor, for some reason, is unable to support the Ogg codecs, I think it's better that they (a) do not support , than (b) they support with proprietary codecs only. Interoperability has more value than conformace. I think forcing browsers to support a codec when it is outdated is wrong. I don't want WA 1.0 to end up like RSS 2.0, having multiple versions incompatible with one another (in WA1.0's case different versions requiring different codecs). - Geoffrey Sneddon
Re: [whatwg] Versioning (was: Re: Using the HTML5 DOCTYPE as a new quirksmode switch)
On 14 Mar 2007, at 15:16, liorean wrote: This is a switch out of backwards-compatibility-hell for a single specific browser they are asking for, not something any other browser vendor should have to worry about. Other browsers introduced quirks mode to match buggy behaviour of others – what's to say that won't happen here, so that other browsers have an IE/Win DOM mode, which would therefore require a switch? - Geoffrey Sneddon
[whatwg] The input stream issues
From implementing parts of the input stream (section 8.2.2 as of writing) yesterday, I found several issues (some of which will show the asshole[1] within me): - Within the step one of the get an attribute sub-algorithm it says "start over" – is this start over the sub-algorithm or the whole algorithm? - Again in step one, why do we need to skip whitespace in both the sub-algorithm and at section one of the inner step for tags? - In step 11, when we have anything apart from a double/single quote or less/greater than sign, we add it to the value, but don't move the position forward, so when we move onto step 12 we add it again. - In step 3 of the very inner set of steps for a content attribute in a tag, is charset case-sensitive? - Again there, shouldn't we be given unicode codepoints for that (as it'll be a unicode string)? - Geoffrey Sneddon [1]: http://diveintomark.org/archives/2004/08/16/specs
Re: [whatwg] Using the HTML5 DOCTYPE as a new quirksmode switch
On 10 Mar 2007, at 13:43, Elliotte Harold wrote: Alexey Feldgendler wrote: The tutorials will just say "Use ". What are those of us who wish to use XML tools on our documents supposed to use? We will need a real DTD at some point, to declare the entities if nothing else. We will not be able to use html>. Then you're still relying on the UA reading the DTD, which it doesn't have to. What use is a DTD if it doesn't need to be read and has no nominative value? - Geoffrey Sneddon
Re: [whatwg] Configure Apache to send the right MIME type for XHTML
On 7 Mar 2007, at 17:07, Anne van Kesteren wrote: If you're after the fact that browsers don't sniff for XML in text/ html that's because the old HTML WG said so (there's a pointer somewhere out there) and changing that now is impossible given how many authors got XML as text/html "completely wrong". http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html – that's the post Anne is referring to (I know of no other time that the HTML WG have said anything on this issue). - Geoffrey Sneddon
Re: [whatwg] versus xml:base
On 5 Mar 2007, at 21:07, Keryx Web wrote: Geoffrey Sneddon wrote: > XHTML 1.0/1.1 doesn't allow xml:base, though, so is the only > > way to set a base URL within the document. In what way would the XHTML 1.0/1.1 spec **disallow** the use of this element from the xml namespace? It's not *part of* the spec, but that's a different matter, right? xml:lang and xml:base are the actual attribute names – the XML namespace exists so they work within namespace aware parsers (as XML- Names is a separate spec that extends XML) – therefore, it must be explicitly allowed within the DTD (like xml:lang is). - Geoffrey Sneddon
Re: [whatwg] element proposal
On 4 Mar 2007, at 14:31, Geoffrey Sneddon wrote: On 4 Mar 2007, at 14:08, Maik Merten wrote: - MPEG4: This is most common in forms of DivX and XviD. Predecessor of H.264. As usual there's patent pool licensing involved. This means that albeit XviD is open sourced it's not really free due to patent licensing issues. That's wrong – H.264 is MPEG4 Part 11 – it's part of the MPEG4 spec. Slight correction of myself, that should read: "H.264 is MPEG4 Part 10". - Geoffrey Sneddon
Re: [whatwg] element proposal
On 4 Mar 2007, at 14:08, Maik Merten wrote: - MPEG4: This is most common in forms of DivX and XviD. Predecessor of H.264. As usual there's patent pool licensing involved. This means that albeit XviD is open sourced it's not really free due to patent licensing issues. That's wrong – H.264 is MPEG4 Part 11 – it's part of the MPEG4 spec. I think we need to look at why the MPEG standards see near universal support and use: as you say, parts of MPEG4 are highly efficient (such as H.264 and AAC), whereas alternatives of things like Theora aren't anywhere near efficient. Also note that patents haven't stopped the web in the past (see: GIF). I really believe that this is too political, as history has shown people will use whatever formats can be created easily, and are well supported. It could be perfectly possible that anything wanting to implement the spec is put off by needing to support a single format that (almost) nobody uses. - Geoffrey Sneddon
Re: [whatwg] element proposal
On 1 Mar 2007, at 05:27, Shadow2531 wrote: On 2/28/07, Anne van Kesteren <[EMAIL PROTECTED]> wrote: Hi, Opera has some internal expiremental builds with an implementation of a element. The element exposes a simple API (for the moment) much like the Audio() object: I think it'd be cool if the video element *just* supported theora. Why limit yourself to one format? Very, very, very few specifications limit the format so strictly, and with good reason: WA1 won't actually be a standard until the next decade, if not the one after that, and in that time, there will easily be _far_ more efficient codecs created. We could just say we only support theora, then add on others later, but why when having multiple options not allow anything to start with? - Geoffrey Sneddon
Re: [whatwg] versus xml:base
On 2 Mar 2007, at 19:25, Keryx Web wrote: Anne van Kesteren skrev: I think should also be allowed in XML documents. It simplifies the language, it already needs to be supported and is able to set Document.baseURI where xml:base can at most set Document.documentElement.baseURI. (Document.baseURI influences how XMLHttpRequest works for instance.) The element section should probably also talk about what happens when you modify the .href attribute. And today the base element already works in at least FFox and Opera also when content is sent as true XHTML 1.0, so this would not really change anything but the spec. XHTML 1.0/1.1 doesn't allow xml:base, though, so is the only way to set a base URL within the document. - Geoffrey Sneddon
Re: [whatwg] Expected behaviour when a is within an innerHTML fragment
On 11 Feb 2007, at 15:11, Geoffrey Sneddon wrote: The point is whether it: a) Gets inserted into the , and changes all the links in the document. b) Appears in some magic place, and changes the links in the HTML fragment. c) Gets ignored. I'm personally in favour of b), as using the normal parsing rules (placing it in ) may well end up changing more than what is wanted. I'll do some testing of current implementations later. So… the testing: For reference, I'll note the behaviour as such: 1: Changes all links in the document. 2: Changes links in HTML fragment. 3: Changes nothing. Test 1: http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C% 21DOCTYPE%20html%3E%0A%3Cscript%20type%3D%22text/javascript%22%3E% 0Afunction%20insert_base%28%29%0A%7B%0A%09document.getElementById%28% 22insert%22%29.innerHTML%3D%22%3Cbase%20href%3D%27http%3A// example.org/%27%3E%3Ca%20href%3D%27test%27%3Etest2%3C/a%3E%22%3B%0A%7D %0A%3C/script%3E%0A%3Cbase%20href%3D%22http%3A//example.com/%22%3E%0A% 3Cp%3E%3Ca%20href%3D%22test%22%3ETest%3C/a%3E%3C/p%3E%0A%3Cp%20id%3D% 22insert%22%3E%3Ca%20href%3D%22javascript%3Ainsert_base%28%29%22% 3Einsert%3C/a%3E%3C/p%3E%0A%3Cp%3E%3Ca%20href%3D%22test%22%3ETest%3C/a %3E%3C/p%3E Safari 2.0.4/419.3: (1) Inserted in DOM (in the innerHTML location). Firefox 2.0.0.1: (3) Inserted in DOM (in the innerHTML location). IE/Mac 5.2.3: (2) (anyway to view the DOM tree?) Opera 9.10: (1) DOM Snapshot for some reason isn't working. IE6/Win: (2) The new never appears in DOM, but the full absolute URLs are in the DOM. IE7/Win: (3) The new never appears in DOM, but the full absolute URLs are in the DOM. Test 2 (this uses onclick to avoid escaping it as a URI): http:// software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html% 3E%0A%3Cbase%20href%3D%22http%3A//example.com/%22%3E%0A%3Cp%3E%3Ca% 20href%3D%22test%22%3ETest%3C/a%3E%3C/p%3E%0A%3Cp%20id%3D%22insert%22% 3E%3Ca%20onclick%3D%22document.getElementById%28%26quot%3Binsert% 26quot%3B%29.innerHTML%3D%26quot%3B%3Cbase%20href%3D%27http%3A// example.org/%27%3E%3Ca%20href%3D%27test%27%3Etest2%3C/a%3E%26quot%3B% 22%3Einsert%3C/a%3E%3C/p%3E%0A%3Cp%3E%3Ca%20href%3D%22test%22%3ETest% 3C/a%3E%3C/p%3E Results the same as above. In conclusion, Safari and Opera change all the links, IE5/Mac and IE6/ Win both change links within the fragment, and Firefox and IE7/Win don't change any links. - Geoffrey Sneddon
Re: [whatwg] Expected behaviour when a is within an innerHTML fragment
On 11 Feb 2007, at 11:37, Jorgen Horstink wrote: On Feb 11, 2007, at 12:01 PM, Geoffrey Sneddon wrote: To take this from a discussion last month on atom-syntax: What is meant to happen if you set innerHTML of a where the set value has both a and an ? first of all the element can only be inserted in HTML documents. That's perfectly fine… If you have control over the content being inserted. The spec states that there can only be one element. The element must be used before any elements that use relative URI's. Sure, there MUST only be one, and in , but as the parsing section dictates, if there is one in it gets moved into . It also, as stands, leaves it possible for the parser to place multiple elements in . If the insertion mode is "in body" handle the token as follows: A start tag token whose tag name is one of: "base", "link", "meta", "title" Parse error. Process the token as if the insertion mode had been "in head". [1] So inserting a element in the body results in a parse error. [1] http://www.whatwg.org/specs/web-apps/current-work/#how-to0 As Anne has already said, the spec says how to deal with parse errors (they aren't fatal errors, as parsing continues as normal). Also, as what you quote says, the element gets inserted "in head". The point is whether it: a) Gets inserted into the , and changes all the links in the document. b) Appears in some magic place, and changes the links in the HTML fragment. c) Gets ignored. I'm personally in favour of b), as using the normal parsing rules (placing it in ) may well end up changing more than what is wanted. I'll do some testing of current implementations later. - Geoffrey Sneddon (I accidentally sent this to just Jorgen! Sorry!)
[whatwg] Expected behaviour when a is within an innerHTML fragment
To take this from a discussion last month on atom-syntax: What is meant to happen if you set innerHTML of a where the set value has both a and an ? - Geoffrey Sneddon
Re: [whatwg] The m element
On 8 Feb 2007, at 15:23, Leons Petrazickis wrote: In the Western world, the standard for highlighting is a neon yellow background. I submit that a much better name for is (, , ). People don't necessarily mark text much -- if anything, "mark" implies underlining, circling, and drawing arrows -- but they do highlight. In university, I often saw students perched with their notes and a highlighter, marking important sections. The semantic meaning is to draw attention for later review. In my eyes such an element is presentational – a more generic element, but one with semantic meaning, like is far more relevant (although it may well be a good idea to suggest it be rendered as highlighted). - Geoffrey Sneddon
Re: [whatwg] Comparison of XForms-Tiny and WF2
On 28 Jan 2007, at 14:31, Elliotte Harold wrote: Geoffrey Sneddon wrote: It's not replacing it, as XForms 1.0 MUST be in an XML document, whereas WF2 can be put in an HTML document. Both, IMO, have very different use-cases. FUD. FUD, FUD. Which part of that is spreading either fear, uncertainty, or doubt, or are you just misusing the acronym? The W3C is trying to drive the Web to XHTML. XForms is part of this vision. Some people disagree with this and have formed the WhatWG to support classic HTML and a different kind of forms tech. The two technologies are in active competition. Maybe one will win. Maybe both will. Maybe neither. I don't know, and I'm not sure which I prefer to happen. Some days I prefer one. Some days the other. But don't kid yourself. They are absolutely competing with each other for market and mindshare, and that competition is only going to grow over the next year. No, HTML and XHTML are competing – XForms MUST be in XHTML, so thereby preventing anyone using HTML cannot use it. Within text/html data (as to include XHTML 1.0 App. C) at least there is no competition whatsoever. - Geoffrey Sneddon
Re: [whatwg] Comparison of XForms-Tiny and WF2
On 27 Jan 2007, at 02:17, Elliotte Harold wrote: Matthew Raymond wrote: "This specification is in no way aimed at replacing XForms 1.0 [XForms], nor is it a subset of XForms 1.0." I agree that it's not a subset of XForms 1.0, but the first claim is pure FUD. Web Forms 2.0 happened precisely because some people didn't like XForms 1.0 and wanted to replace it with something they liked better. I'm not saying they're wrong, or that their spec is worse, but don't kid yourself about what's going on here. It's not replacing it, as XForms 1.0 MUST be in an XML document, whereas WF2 can be put in an HTML document. Both, IMO, have very different use-cases. - Geoffrey Sneddon