[whatwg] WebVTT feedback (was Re: Video feedback)
Hi Ian, all, I am very excited by the possibilities that Ian outlined for WebVTT and how we can add V2 features. I have some comments on the discussion below, but first I'd like to point people to a piece of work that Ronny Mennerich from LeanbackPlayer has recently undertaken (with a little of my help). Ronny has create this Web page: http://leanbackplayer.com/other/webvtt.html . It summarizes the WebVTT file format and provides visual clarifications on how the cue settings work. I would like to point out that Ronny has done the drawings according to how we understand the WebVTT / HTML spec, so would appreciate somebody checking if it's correct. I would also like to point out the issues that Ronny lists on the bottom of that page and that we need to resolve. I've copied them here for discussion and added some more detail: * A:[start|middle|end] -- If the [subtitle box] and also the [subtitle text] are aligned by the designer within a CSS (file), which setting dominates: CSS or cue setting, for both [subtitle box] and [subtitle text]? -- As it is text alignment, for me it is alignment of text within the [subtitle text] element only, but not also alignment/positioning of [subtitle text] element in relation to the [subtitle box]! However, Silvia reckons the anchoring of the box changes with the alignment, so that it is possible to actually middle align the [subtitle box] with A:middle. We wonder which understanding is correct. * T:[number]% -- If the [subtitle box] and also the [subtitle text] are aligned by the designer within a CSS (file), which setting dominates: CSS or cue setting, for both [subtitle box] and [subtitle text]? -- What about it if T is used together with A:[start|middle|end]? * S:[number] -- If using S:[number] without % (percentage) it is not clear whether px or em is the unit for the text size. -- If using em as unit it has to be cleared how to set and calculate the text size value! Because there is no real value, only integer, for [number] we can not make S:1.2 so we need a note for it like e.g. S:120 is an example value, than the text size has to be text-size: (120/100)em; If using px as unit it is easy, no calculation needed, [number] could be the new text size! If e.g. S:12 is an example value, than the text size has to be text-size: 12px; * cue voice tag -- why are we not using voice name declaration like in the cue class tags with a dot separation like v.VoiceNamevoice text/v and without spaces (eg. v VoiceName). This could avoid errors by .vtt file writer and would also be much more clear to implement. Please keep Ronny in the CC when you answer, because he is not subscribed to the list. Now my feedback on the WebVTT that Ian's Video feedback email provided: On Fri, Jun 3, 2011 at 9:28 AM, Ian Hickson i...@hixie.ch wrote: On Mon, 3 Jan 2011, Philip J盲genstedt wrote: + I've added a magic string that is required on the format to make it recognisable in environments with no or unreliable type labeling. Is there a reason it's WEBVTT FILE instead of just WEBVTT? FILE seems redundant and like unnecessary typing to me. It seemed more likely that non-WebVTT files would start with a line that said just WEBVTT than a line that said just WEBVTT FILE. But I guess WEBVTT FILE FORMAT is just as likely and it'll be caught. I've changed it to just WEBVTT; there may be existing implementations that only accept WEBVTT FILE so for now I recommend that authors still use the longer header. I'll tweet the changes to help spread the news. I like it this short. :-) On Wed, 8 Sep 2010, Philip J盲genstedt wrote: In the discussion on public-html-a11y trackgroup was suggested to group together mutually exclusive tracks, so that enabling one automatically disables the others in the same trackgroup. I guess it's up to the UA how to enable and disable tracks now, but the only option is making them all mutually exclusive (as existing players do) or a weird kind of context menu where it's possible to enable and disable tracks completely independently. Neither options is great, but as a user I would almost certainly prefer all tracks being mutually exclusive and requiring scripts to enable several at once. It's not clear to me what the use case is for having multiple groups of mutually exclusive tracks. The intent of the spec as written was that a browser would by default just have a list of all the subtitle and caption tracks (the latter with suitable icons next to them, e.g. the [CC] icon in US locales), and the user would pick one (or none) from the list. One could easily imagine a UA allowing the user to enable multiple tracks by having the user ctrl-click a menu item, though, or some similar solution, much like with the commonly seen select box UI. In the vast majority of cases, all tracks are intended to be mutually exclusive, such as English+English HoH or subtitles in different languages. No media
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Sat, Jun 4, 2011 at 11:05 AM, Silvia Pfeiffer silviapfeiff...@gmail.comwrote: If we introduced the scrolling behaviour that I described above where cues that are rendered into the same location as a previous still active cue push that previous cue up, we get this behaviour covered too. I don't think so. This is a scene with two simultaneous conversations; in order to help make the subtitles readable, they were authored to keep one conversation pair always on top, and the other always on the bottom. Having captions move while they're already displayed wouldn't do this. (I think it'd make it unreadable, actually, by adding motion into the mix.) Eventually, we will want to get rid of the legacy format and just deliver WebVTT, but they still need to display as though they came from the original broadcast caption format for contractual reasons. I don't know what degree of sameness they expect, but as users can always override their font (implying different wrapping results, etc.), you'll never be able to guarantee that it'll look identical to the output of a more fixed format. If captions have editing like the above, it could even result in a visible drop in quality. -- Glenn Maynard
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Mon, Jun 6, 2011 at 5:30 PM, Glenn Maynard gl...@zewt.org wrote: On Sat, Jun 4, 2011 at 11:05 AM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: If we introduced the scrolling behaviour that I described above where cues that are rendered into the same location as a previous still active cue push that previous cue up, we get this behaviour covered too. I don't think so. This is a scene with two simultaneous conversations; in order to help make the subtitles readable, they were authored to keep one conversation pair always on top, and the other always on the bottom. Having captions move while they're already displayed wouldn't do this. (I think it'd make it unreadable, actually, by adding motion into the mix.) If you use explicit L: placement, we could turn off the scrolling behaviour. I don't think your example is a typical one. In my (unmeasured) experience, the scrolling behaviour is much more typical. In fact, that example of yours is really really confusing to me. I would much prefer if the text wasn't displayed on top of each other, but at different locations on the screen - one to the right one to the left, preferably underneath the people that speak. That is a better experience. I believe that example of yours only looks that way because somebody had to work around the problem that the subtitle authoring format didn't allow for such explicit placement. Eventually, we will want to get rid of the legacy format and just deliver WebVTT, but they still need to display as though they came from the original broadcast caption format for contractual reasons. I don't know what degree of sameness they expect, but as users can always override their font (implying different wrapping results, etc.), you'll never be able to guarantee that it'll look identical to the output of a more fixed format. If captions have editing like the above, it could even result in a visible drop in quality. I think the opposite is true. Right now, people work around some of the ways in which they really would like to render their captions because the formats don't allow for example explicit placement. Therefore we get poor quality captions right now. With the features available, we should see better captions, not worse. Regards, Silvia.
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Mon, Jun 6, 2011 at 3:41 AM, Silvia Pfeiffer silviapfeiff...@gmail.comwrote: I don't think your example is a typical one. In my (unmeasured) experience, the scrolling behaviour is much more typical. It's definitely an uncommonly complex scene to caption. I raised it wondering about the quality level the format can manage in the harder cases. (To be clear, this isn't something any of the popular ad hoc formats can do, either--this was achieved with a brittle rendering-specific hacks.) I've never seen the scrolling behavior in subtitles, though. I think they're only common in live captions. In fact, that example of yours is really really confusing to me. I would much prefer if the text wasn't displayed on top of each other, but at different locations on the screen - one to the right one to the left, preferably underneath the people that speak. That is a better experience. I believe that example of yours only looks that way because somebody had to work around the problem that the subtitle authoring format didn't allow for such explicit placement. The scene is jumping all over the place--at one point one pair is directly *above* the other pair in the frame. There's no left/right speaker correspondance. FWIW, I find it intuitive to read. Eventually, we will want to get rid of the legacy format and just deliver WebVTT, but they still need to display as though they came from the original broadcast caption format for contractual reasons. I don't know what degree of sameness they expect, but as users can always override their font (implying different wrapping results, etc.), you'll never be able to guarantee that it'll look identical to the output of a more fixed format. If captions have editing like the above, it could even result in a visible drop in quality. I think the opposite is true. Right now, people work around some of the ways in which they really would like to render their captions because the formats don't allow for example explicit placement. Therefore we get poor quality captions right now. With the features available, we should see better captions, not worse. Sure (most of the time), but if the contracts you refer to require that their content rendered with WebVTT look the same as they did in their original format, that won't always be possible. -- Glenn Maynard
Re: [whatwg] WebVTT feedback (was Re: Video feedback)
On Mon, Jun 6, 2011 at 6:04 PM, Glenn Maynard gl...@zewt.org wrote: On Mon, Jun 6, 2011 at 3:41 AM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: I don't think your example is a typical one. In my (unmeasured) experience, the scrolling behaviour is much more typical. It's definitely an uncommonly complex scene to caption. I raised it wondering about the quality level the format can manage in the harder cases. (To be clear, this isn't something any of the popular ad hoc formats can do, either--this was achieved with a brittle rendering-specific hacks.) I've never seen the scrolling behavior in subtitles, though. I think they're only common in live captions. Agreed, mostly live, but also when playing back content that had live captions. Also, we do want to support live captions here, so I think they are relevant. In fact, that example of yours is really really confusing to me. I would much prefer if the text wasn't displayed on top of each other, but at different locations on the screen - one to the right one to the left, preferably underneath the people that speak. That is a better experience. I believe that example of yours only looks that way because somebody had to work around the problem that the subtitle authoring format didn't allow for such explicit placement. The scene is jumping all over the place--at one point one pair is directly *above* the other pair in the frame. There's no left/right speaker correspondance. FWIW, I find it intuitive to read. Even when they are above each other, you could place one on the left at the top next to the first speaker and one on the right at the bottom next to the other speaker. I guess intuitions can be different. :-) Eventually, we will want to get rid of the legacy format and just deliver WebVTT, but they still need to display as though they came from the original broadcast caption format for contractual reasons. I don't know what degree of sameness they expect, but as users can always override their font (implying different wrapping results, etc.), you'll never be able to guarantee that it'll look identical to the output of a more fixed format. If captions have editing like the above, it could even result in a visible drop in quality. I think the opposite is true. Right now, people work around some of the ways in which they really would like to render their captions because the formats don't allow for example explicit placement. Therefore we get poor quality captions right now. With the features available, we should see better captions, not worse. Sure (most of the time), but if the contracts you refer to require that their content rendered with WebVTT look the same as they did in their original format, that won't always be possible. Most things are actually possible - I've tried to give this a shot with CEA-608 captions here: http://www.w3.org/WAI/PF/HTML/wiki/Media_608_WebVTT_Conversion . Feedback very welcome! Cheers, Silvia.
[whatwg] Codecs (Re: PeerConnection feedback)
On 05/31/11 23:45, Ian Hickson wrote: On Sat, 9 Apr 2011, James Salsman wrote: Sorry for the top posting, but I would like to reiterate my considered opinion that Speex be supported for recording. It is the standard format available from Adobe Flash recording, low bandwidth, open source and unencumbered, efficient, and it is high quality for its bandwidth. My plan with the codecs issue here is to let the implementors figure out which codecs they want to implement, and then once there's a common set across all the implementations, to require that. So I would recommend petitioning the implementors if you have particular codec desires. :-) I can't find James' original post, so this may be a new subject line - I just want to note that the recent WebRTC code release from Google contains implementations of both iLBC and iSAC codecs; we believe that makes them open-source, unencumbered and implemented (with all the usual disclaimers). Others will have to discuss the properties that make one or the other best for a particular application; part of that discussion will happen on the public-web...@w3.org list and the rtc...@ietf.org list. Harald
Re: [whatwg] Content-Disposition property for a tags
On 6/5/11 3:53 PM, Bjartur Thorlacius wrote: On 6/5/11, Boris Zbarskybzbar...@mit.edu wrote: Why need they be? This isn't Bittorrent. I think you completely misunderstood my mail... the point is that browses do NOT all use the last non-empty path component; some try to guess a filename based on the query params, in various ways. No, I understood - my point that it doesn't matter; browsers need not standardize on variables yo deduce filenames from*. My point was that there should be _a_ standardized way that sites can use to get consistent behavior across browsers. Content-Disposition headers see like that way to me. -Boris
Re: [whatwg] Content-Disposition property for a tags
On 2011-06-03 17:46, Bjartur Thorlacius wrote: ... I strongly disagree. I think browsers that use the Content-Disposition filename for attachment but not inline are just buggy and should be fixed. FWIW MSIE9 seems to honor the filename hint with inline (contrary to the test results mentioned earlier in the thread). ... Hint: the test page has a feedback link. That being said: I just tried http://greenbytes.de/tech/tc2231/inlwithasciifilename.asis and IE9 seems to ignore the filename information. Best regards, Julian
Re: [whatwg] Content-Disposition property for a tags
Am 03.06.2011, 15:16 Uhr, schrieb Eduard Pascual herenva...@gmail.com: On Fri, Jun 3, 2011 at 2:23 PM, Dennis Joachimsthaler den...@efjot.de wrote: This grants the ability for any content provider to use an explicit Content-Disposition: inline HTTP header to effectively block download links from arbitrary sources. True. Is it still so that some browsers ignore the filename part of a content-disposition if an inline disposition is used? Ok, I have never even thought about using the filename argument with an explicit inline disposition. When I am in control of the headers, I find it easier to fix the filename with 301/302 redirects, and also have the bonus of some control about how that should be cached... In short, I think that responding with a 2xx code _and_ attempting to change what's essentially part of the URI through other means is a contradiction, and thus a mistake on the best case, or some attempt to fool the browser into doing something it shouldn't do on the worst case. Because of that, I'm ok with whatever way the browser decides to handle the contradiction. You can read my position about error-handling on my earlier post some minutes ago. Personally, on the case I'm most concerned about (data: URIs used for Save log and similar functionalities), there is never a true disposition header; so my use cases do not push towards any of the options. What I have just written is what I feel is the most reasonable approach (the provider of a resource should have some control over it above an arbitrary third party). Data URIs would very well benefit from this attribute, in my opinion. This would also cater to the canvas lovers. Downloading something drawn on a canvas instantly? No problem! a href=data: disposition=attachment filename=canvas.pngDownload me!/a Yep, these are the cases I am actually concerned about. But on these scenarios there is no HTTP header involved, so it doesn't matter (for them) what takes precedence. Yes, this was only an example which just came to my mind, nothing special. This is still one thing that has to be settled though. a) How do we call the attribute? Is there any reason to _not_ call it 'content-disposition'? Ok, there is one: verbosity. But, personally, I have no issue with some verbosity if it helps making things blatantly explicit. So many years of browser vendors reverse-engineering the error handling in competing products have convinced me that being explicit is a good thing. Yes, I was trying to refer to the verbosity. There's no html attributes with dashes in them as far as I know, except for data-, which are user- defined. This would kind of break the convention a little. I could think about having contentdispo or some shortname like this, it would fit better to what we currently have in html. b) Do we include the filename part directly into the attribute or do we create a SECOND attribute just for this? People have been posting several formats now. But I don't think we actually have *agreed* upon one of those. What's wrong with using the same format as HTTP? I am not too strongly attached to that format, but I see no point in making things different from what we already have. As a minor advantage, implementors can reuse (or copy-paste) some few lines of parsing code instead of writting them again, since they already parse the header when they get it on an HTTP response. Again, html convention: Currently html only has one statement in every attribute, except for things like events (which is javascript) and style (which is also ANOTHER language: css). Seems cleaner to me if we stay to the standard and not change the syntax rules. Please tell me if I missed anything here! Regards, Eduard Pascual
Re: [whatwg] Content-Disposition property for a tags
On Mon, Jun 6, 2011 at 6:59 PM, Dennis Joachimsthaler den...@efjot.de wrote: Yes, I was trying to refer to the verbosity. There's no html attributes with dashes in them as far as I know, except for data-, which are user- defined. This would kind of break the convention a little. I could think about having contentdispo or some shortname like this, it would fit better to what we currently have in html. Maybe disposition could work? For the HTTP header, the content part indeed refers to the content of the response; but on the case of a link, the attribute would be referring to the linked resource, rather than the actual content of the element. So it's more accurate, we reduce verbosity, and we get rid of the dash, all of this without having to make the name less explicit nor relying on an arbitrary abbreviation (ie: why dispo and not disp or dispos? Since there isn't a clear boundary, it could be harder to remember; but dropping the content- part seems more straight-forward). Again, html convention: Currently html only has one statement in every attribute, except for things like events (which is javascript) and style (which is also ANOTHER language: css). Well, meta elements with a http-equiv attribute normally have a full HTTP header (including parameters if needed) in their content attribute, so I see no issue in taking a similar approach. After all, HTTP _is_ another language (or protocol, to be more precise, but protocols are still a kind of languages). Seems cleaner to me if we stay to the standard and not change the syntax rules. HTTP is also a standard. So we could stick to it. It all boils on a choice of which standard we honor above the other. Seeing that HTTP is an actual standard, rather than a mere convention, and we are actually borrowing a feature from it, it looks like the winner to me. Please tell me if I missed anything here! From the top of my head, @class is defined to be a space-separated list of class names. Sure, it is a simpler syntax, but it's still a multiple content attribute. I think there are some more cases, but I can't recall any right now. Regards, Eduard Pascual
Re: [whatwg] The choice of script global object to use when the script element is moved
On Wed, 2 Feb 2011, Henri Sivonen wrote: On Feb 2, 2011, at 03:07, Ian Hickson wrote: I suppose we could make it so that scripts get neutered when the document that they were first associated with gets unloaded. Would that work? We did something different. Proposal #1: Proposal #4 (what Gecko now does): * If at the time when the parser triggers the 'run' algorithm, the owner document of the script is not the same document whose active parser the parser is, set the 'already executed' flag and abort the steps. * If at the time of a script becoming available for evaluation the owner document of the script is not the same document that was the owner document at the time of the 'run' algorithm, don't evaluate the script. See https://bugzilla.mozilla.org/show_bug.cgi?id=592366 I believe this is what the spec now says (this was fixed in response to a bug that was filed, IIRC). Please let me know if it's still broken. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Content-Disposition property for a tags
On 6/6/11, Boris Zbarsky bzbar...@mit.edu wrote: My point was that there should be _a_ standardized way that sites can use to get consistent behavior across browsers. Content-Disposition headers see like that way to me. More importantly there should be an implementation defined convention so users get consistent behavior across sites. I don't see why authors should care so much how users identify documents internally, when the original point (type information) is nullified by filename sanitization crucial to security. Browser are able to deduce descriptive and rather unique names from the title, terser ones from the URI and allow the user to choose his own. You'll never able to assume your suggestion will be used, anyway.
Re: [whatwg] Javascript: URLs as element attributes
On Wed, 9 Feb 2011, Boris Zbarsky wrote: On 2/9/11 10:12 PM, Ian Hickson wrote: On Mon, 15 Nov 2010, Boris Zbarsky wrote: On 11/15/10 8:15 PM, Ian Hickson wrote: Gecko's currently-intended behavior is to do what [the spec] describes in all cases except: iframe src=javascript: object data=javascript: embed src=javascript: applet code=javascript: (Note that the spec has since changed on this; javascript: either runs in a browsing context that it is navigating, or it doesn't run at all.) What does it do for those cases if it doesn't match the spec? For iframe the behavior in Gecko currently is different in terms of what the URI of the result document of javascript: is set to. Try this: data:text/html,body onload=alert(window[0].location)iframe src=javascript:'' Woah, funky. (Gecko thinks the location is javascript:''.) Note that there is some confusion here in terms of browsing contexts and object, sinceobject does expose a Document object sometimes (but not others) and does participate in session history sometimes, I believe... So I'm not quite sure what behavior the spec calls for forobject. It's defined; see the section on theonject element. I've read that section, in fact. I couldn't make sense of what behavior it actually called for. Has it changed recently (last few months) to become clearer such that rereading would be worthwhile? Not as far as I'm aware. Could you elaborate on how it is confusing? I'm eager to make this understandable! At least in Gecko, the return value string is examined to see whether all the charcode values are 255. If they are, then the string is converted to a byte array by just dropping the high byte of every char. So you can pretty easily generate image data this way. If any of the bytes are 255, then the string is encoded as UTF-8 instead. Hm. This currently isn't specced; the spec just assumes the return value is text/html string data and doesn't say what encoding to use. Is there a good way to test this in the context of aniframe, where all the browsers do something with javascript:? body onload=alert(window[0].document.characterSet)iframe src=javascript:'\u0400' http://junkyard.damowmow.com/466 Gecko: UTF-8 if the 'javascript:' URL includes characters 255, ISO-8859-1 if all the input characters are 255. WebKit seems to pick my default encoding regardless. Opera returns . IE8 returns undefined. Since Gecko seems to be alone in this weird behaviour, I haven't specced it. I couldn't find any other effect (e.g. the input seems to always be treated as Unicode, not converted to bytes and redecoded, regardless of what I make it look like, including UTF-16 and UTF-8). On Thu, 10 Feb 2011, Adam Barth wrote: On Tue, Nov 30, 2010 at 11:37 AM, Darin Adler da...@apple.com wrote: In WebKit, we have treated the javascript URL scheme as a special case, with explicit code in the loader, and not handled by general purpose resource protocol machinery. Maciej Stachowiak suggested this approach, back in 2002, and one of the reasons he gave me at the time is that thought WebKit would be more likely to get the security policy right if code paths opted in to JavaScript execution rather than opting out of javascript URL scheme handling. Apologies for not reading the whole thread before replying, but the design Darin describes [above] has worked well in WebKit thus far. I'd be hesitant to make JavaScript URLs work in more contexts due to the risk of introducing security vulnerabilities into the engine. That's black-box equivalent to what the spec currently requires, I believe (though the spec implements it more like what Boris describes:). On Thu, 10 Feb 2011, Boris Zbarsky wrote: For what it's worth, Gecko treats javascript: URLs as a general protocol, but with tracking of where the URL came from required for the script to actually execute and explicit opt-in on the caller's part required to execute outside a sandbox. This too has worked well in terms of security, for what it's worth, while offering a lot more flexibility in terms of how and where javascript: URIs can work. I don't think we should gate the spec here on Webkit's implementation details if we think a certain behavior is correct but hard to support in Webkit On Mon, 16 May 2011, Philip Jägenstedt wrote: On Sat, 14 May 2011 00:34:36 +0200, Ian Hickson i...@hixie.ch wrote: On Mon, 14 Feb 2011, Philip Jägenstedt wrote: For the record, I removed Opera's support (I assume it was an unintended side-effect) for object data=javascript:... along with the rest at the time when I wrote my previous mail in this thread. This intentionally doesn't match what the spec says. (Disclaimer: this is only my opinion on something that isn't really my area of expertise, so others
Re: [whatwg] Should script run if it comes from a HTML fragment?
On Thu, 10 Feb 2011, Henri Sivonen wrote: * innerHTML doesn't run scripts and they are inserted disabled. FWIW, here's a counter-example: http://www.oele.net/innerhtmljs2.html The above runs the script in all browsers except Firefox 4 (which follows the spec). The pattern is reportedly from the http://crossbrowserajax.com/ library. However, http://www.oele.net/innerhtmljs.html doesn't run in IE9 PP7. I'm not thrilled about adding code to support this quirky pattern at this point of the release cycle (evang is being attempted), but I thought I'd mention this finding even though I'm not asking for a spec change (at least not yet). Wow. That difference is absurd. Since Gecko and WebKit both do what the spec says now, I've left the spec as is. I don't even know where I'd begin in terms of doing what IE does here. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Javascript: URLs as element attributes
On 6/6/11 4:45 PM, Ian Hickson wrote: data:text/html,body onload=alert(window[0].location)iframe src=javascript:'' Woah, funky. (Gecko thinks the location is javascript:''.) Well... it sort of is. ;) It's defined; see the section on theonject element. I've read that section, in fact. I couldn't make sense of what behavior it actually called for. Has it changed recently (last few months) to become clearer such that rereading would be worthwhile? Not as far as I'm aware. Could you elaborate on how it is confusing? I'm eager to make this understandable! I'll try reading it again and taking notes, I guess. When I can find time to. :( The latency is killing us here. Since Gecko seems to be alone in this weird behaviour, I haven't specced it. I couldn't find any other effect (e.g. the input seems to always be treated as Unicode, not converted to bytes and redecoded, regardless of what I make it look like, including UTF-16 and UTF-8). You can detect other effects by seeing what unescape() does in the resulting document, iirc. As well as URIs including %-encoded bytes and so forth. Also you can detect what charset is used for stylesheets included by the document that don't declare their own charset. There are probably other places that use the document encoding. Worth testing some of this stuff -Boris