Re: [whatwg] Please always use utf-8 for Web Workers

2009-10-14 Thread Ian Hickson
On Fri, 25 Sep 2009, Simon Pieters wrote:

 Workers are new and seems very likely to be incompatible with existing 
 scripts. So it is not subject to legacy content with legacy encodings. 
 Therefore, we should be able to always use utf-8 for workers. Always 
 using utf-8 is simpler to implement and test and encourages people to 
 switch to utf-8 elsewhere.

On Fri, 25 Sep 2009, Jonathan Cook wrote:

 The importScripts portion of the Web Workers API is compatible with 
 existing scripts, but I'm all for more UTF-8 :)  If the restriction is 
 added to the spec, I'd want to know that a very clear error was going to 
 be thrown explaining the problem.

On Fri, 25 Sep 2009, Simon Pieters wrote:
 
 I'm not sure that throwing an error is a good idea. Would you throw an 
 error when there's no declared encoding? That seems to be annoying for 
 the common case of just using ASCII characters. Throwing an error when 
 there is a declared encoding that is not utf-8 might work, but are there 
 many scripts that have a declared encoding and are not utf-8?
 
 I think it is to just ignore any declared encoding and assume utf-8. If 
 people are using non-ascii in another encoding, then they would notice 
 by seeing that their text looks like garbage. Browsers could also log 
 messages to their error consoles about encoding declarations declaring 
 non-utf-8 and/or sequences of bytes that are not valid utf-8.

On Fri, 25 Sep 2009, Drew Wilson wrote:

 Are you saying that if I load a script via a script tag in a web page, 
 then load it via importScripts() in a worker, that the result of loading 
 that script in those two cases should/could be different because of 
 different decoding mechanisms?

 If that's what's being proposed, that seems bad.

On Fri, 25 Sep 2009, Anne van Kesteren wrote:

 That could happen already if the script loaded via script did not have 
 an encoding set and got it from script charset.

On Fri, 25 Sep 2009, Drew Wilson wrote:

 Certainly. If I explicitly override the charset, then that seems like 
 reasonable behavior. Having the default decoding vary between 
 importScripts() and script seems bad, especially since you can't 
 override charsets with importScripts().

On Fri, 25 Sep 2009, Anne van Kesteren wrote:
 
 It does not need to be overridden per se. If the document character 
 encoding is different from UTF-8 then a script loaded through script 
 will be decoded differently from a script loaded through importScripts() 
 as well.

On Mon, 28 Sep 2009, Michael Nordman wrote:

 Leaving legacy encodings behind would be a good thing if we can get away 
 with it... jmho.

Ok, I've mode workers assume UTF-8 always.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Please always use utf-8 for Web Workers

2009-09-28 Thread Anne van Kesteren
On Fri, 25 Sep 2009 19:34:18 +0200, Drew Wilson atwil...@google.com  
wrote:

Again, apologies if I'm misunderstanding the suggestion.


I thought that by default encoding you meant the encoding that would be  
used if other means of getting the encoding failed. If there is only one  
encoding it is not exactly the default, since it cannot be changed.



--
Anne van Kesteren
http://annevankesteren.nl/


Re: [whatwg] Please always use utf-8 for Web Workers

2009-09-28 Thread Michael Nordman
Leaving legacy encodings behind would be a good thing if we can get away
with it... jmho.

On Mon, Sep 28, 2009 at 9:59 AM, Drew Wilson atwil...@google.com wrote:

 Ah, sorry for the confusion - my use of default was indeed sloppy. I'm
 saying that if the server is explicitly specifying the charset either via a
 header or via BOMs, it seems bad to ignore it since there's no other way to
 override the charset.
 I understand your point, though - since workers don't inherit the document
 encoding from their parent, they may indeed decode a given resource
 differently if the server isn't specifying a charset in some way.

 -atw


 On Mon, Sep 28, 2009 at 4:47 AM, Anne van Kesteren ann...@opera.comwrote:

 On Fri, 25 Sep 2009 19:34:18 +0200, Drew Wilson atwil...@google.com
 wrote:

 Again, apologies if I'm misunderstanding the suggestion.


 I thought that by default encoding you meant the encoding that would be
 used if other means of getting the encoding failed. If there is only one
 encoding it is not exactly the default, since it cannot be changed.



 --
 Anne van Kesteren
 http://annevankesteren.nl/





[whatwg] Please always use utf-8 for Web Workers

2009-09-25 Thread Simon Pieters
Workers are new and seems very likely to be incompatible with existing  
scripts. So it is not subject to legacy content with legacy encodings.  
Therefore, we should be able to always use utf-8 for workers. Always using  
utf-8 is simpler to implement and test and encourages people to switch to  
utf-8 elsewhere.


--
Simon Pieters
Opera Software


Re: [whatwg] Please always use utf-8 for Web Workers

2009-09-25 Thread Jonathan Cook
The importScripts portion of the Web Workers API is compatible with 
existing scripts, but I'm all for more UTF-8 :)  If the restriction is 
added to the spec, I'd want to know that a very clear error was going to 
be thrown explaining the problem.


Regards,
Jonathan 'J5' Cook

Simon Pieters wrote:
Workers are new and seems very likely to be incompatible with existing 
scripts. So it is not subject to legacy content with legacy encodings. 
Therefore, we should be able to always use utf-8 for workers. Always 
using utf-8 is simpler to implement and test and encourages people to 
switch to utf-8 elsewhere.






Re: [whatwg] Please always use utf-8 for Web Workers

2009-09-25 Thread Simon Pieters
On Fri, 25 Sep 2009 15:31:41 +0200, Jonathan Cook  
jonathan.j5.c...@gmail.com wrote:


The importScripts portion of the Web Workers API is compatible with  
existing scripts,


Only if those scripts don't use any of the banned interfaces and  
constructors, right?



but I'm all for more UTF-8 :)  If the restriction is added to the spec,  
I'd want to know that a very clear error was going to be thrown  
explaining the problem.


I'm not sure that throwing an error is a good idea. Would you throw an  
error when there's no declared encoding? That seems to be annoying for the  
common case of just using ASCII characters. Throwing an error when there  
is a declared encoding that is not utf-8 might work, but are there many  
scripts that have a declared encoding and are not utf-8?


I think it is to just ignore any declared encoding and assume utf-8. If  
people are using non-ascii in another encoding, then they would notice by  
seeing that their text looks like garbage. Browsers could also log  
messages to their error consoles about encoding declarations declaring  
non-utf-8 and/or sequences of bytes that are not valid utf-8.


--
Simon Pieters
Opera Software


Re: [whatwg] Please always use utf-8 for Web Workers

2009-09-25 Thread Drew Wilson
Are you saying that if I load a script via a script tag in a web page,
then load it via importScripts() in a worker, that the result of loading
that script in those two cases should/could be different because of
different decoding mechanisms?
If that's what's being proposed, that seems bad.

-atw

On Fri, Sep 25, 2009 at 6:45 AM, Simon Pieters sim...@opera.com wrote:

 On Fri, 25 Sep 2009 15:31:41 +0200, Jonathan Cook 
 jonathan.j5.c...@gmail.com wrote:

  The importScripts portion of the Web Workers API is compatible with
 existing scripts,


 Only if those scripts don't use any of the banned interfaces and
 constructors, right?


  but I'm all for more UTF-8 :)  If the restriction is added to the spec,
 I'd want to know that a very clear error was going to be thrown explaining
 the problem.


 I'm not sure that throwing an error is a good idea. Would you throw an
 error when there's no declared encoding? That seems to be annoying for the
 common case of just using ASCII characters. Throwing an error when there is
 a declared encoding that is not utf-8 might work, but are there many scripts
 that have a declared encoding and are not utf-8?

 I think it is to just ignore any declared encoding and assume utf-8. If
 people are using non-ascii in another encoding, then they would notice by
 seeing that their text looks like garbage. Browsers could also log messages
 to their error consoles about encoding declarations declaring non-utf-8
 and/or sequences of bytes that are not valid utf-8.

 --
 Simon Pieters
 Opera Software



Re: [whatwg] Please always use utf-8 for Web Workers

2009-09-25 Thread Anne van Kesteren
On Fri, 25 Sep 2009 18:39:48 +0200, Drew Wilson atwil...@google.com  
wrote:

Are you saying that if I load a script via a script tag in a web page,
then load it via importScripts() in a worker, that the result of loading
that script in those two cases should/could be different because of
different decoding mechanisms?
If that's what's being proposed, that seems bad.


That could happen already if the script loaded via script did not have  
an encoding set and got it from script charset.



--
Anne van Kesteren
http://annevankesteren.nl/


Re: [whatwg] Please always use utf-8 for Web Workers

2009-09-25 Thread Drew Wilson
Certainly. If I explicitly override the charset, then that seems like
reasonable behavior.
Having the default decoding vary between importScripts() and script seems
bad, especially since you can't override charsets with importScripts().

-atw

On Fri, Sep 25, 2009 at 10:08 AM, Anne van Kesteren ann...@opera.comwrote:

 On Fri, 25 Sep 2009 18:39:48 +0200, Drew Wilson atwil...@google.com
 wrote:

 Are you saying that if I load a script via a script tag in a web page,
 then load it via importScripts() in a worker, that the result of loading
 that script in those two cases should/could be different because of
 different decoding mechanisms?
 If that's what's being proposed, that seems bad.


 That could happen already if the script loaded via script did not have an
 encoding set and got it from script charset.


 --
 Anne van Kesteren
 http://annevankesteren.nl/



Re: [whatwg] Please always use utf-8 for Web Workers

2009-09-25 Thread Anne van Kesteren
On Fri, 25 Sep 2009 19:16:47 +0200, Drew Wilson atwil...@google.com  
wrote:

Certainly. If I explicitly override the charset, then that seems like
reasonable behavior.


It does not need to be overridden per se. If the document character  
encoding is different from UTF-8 then a script loaded through script  
will be decoded differently from a script loaded through importScripts()  
as well.



Having the default decoding vary between importScripts() and script  
seems bad, especially since you can't override charsets with  
importScripts().


This is already the case. The suggestion was not about changing the  
default.



--
Anne van Kesteren
http://annevankesteren.nl/


Re: [whatwg] Please always use utf-8 for Web Workers

2009-09-25 Thread Drew Wilson
Then I'm misunderstanding the suggestion then. My reading of:
Therefore, we should be able to always use utf-8 for workers. Always using
utf-8 is simpler to implement and test and encourages people to switch to
utf-8 elsewhere.

...was we should ignore charset headers coming from the server and always
treat script data imported via importScripts() as if it were encoded as
utf-8 (i.e. skip step 3 of section 4.3 of the web workers spec), which
seems like it's effectively changing the default decoding.

Which means that someone naively serving up an existing Big5-encoded script
(containing, say, string resources) with the appropriate charset header will
find it fails when loaded into workers.

Again, apologies if I'm misunderstanding the suggestion.

-atw

On Fri, Sep 25, 2009 at 10:21 AM, Anne van Kesteren ann...@opera.comwrote:

 On Fri, 25 Sep 2009 19:16:47 +0200, Drew Wilson atwil...@google.com
 wrote:

 Certainly. If I explicitly override the charset, then that seems like
 reasonable behavior.


 It does not need to be overridden per se. If the document character
 encoding is different from UTF-8 then a script loaded through script will
 be decoded differently from a script loaded through importScripts() as well.


  Having the default decoding vary between importScripts() and script
 seems bad, especially since you can't override charsets with
 importScripts().


 This is already the case. The suggestion was not about changing the
 default.



 --
 Anne van Kesteren
 http://annevankesteren.nl/