Re: New full Unicode for ES6 idea

2012-03-02 Thread Glenn Adams
On Fri, Mar 2, 2012 at 12:58 AM, Erik Corry erik.co...@gmail.com wrote: 2012/3/1 Glenn Adams gl...@skynav.com: I'd like to plead for a solution rather like the one Java has, where strings are sequences of UTF-16 codes and there are specialized ways to iterate over them. Looking at this

Re: New full Unicode for ES6 idea

2012-03-02 Thread Erik Corry
2012/3/2 Glenn Adams gl...@skynav.com: On Fri, Mar 2, 2012 at 12:58 AM, Erik Corry erik.co...@gmail.com wrote: 2012/3/1 Glenn Adams gl...@skynav.com: I'd like to plead for a solution rather like the one Java has, where strings are sequences of UTF-16 codes and there are specialized ways

Re: New full Unicode for ES6 idea

2012-03-02 Thread Allen Wirfs-Brock
On Mar 1, 2012, at 11:09 PM, Norbert Lindenberg wrote: Comments: 1) In terms of the prioritization I suggested a few days ago https://mail.mozilla.org/pipermail/es-discuss/2012-February/020721.html it seems you're considering item 6 essential, item 1 a side effect (whose consequences

Re: New full Unicode for ES6 idea

2012-03-02 Thread Glenn Adams
On Fri, Mar 2, 2012 at 2:13 AM, Erik Corry erik.co...@gmail.com wrote: level 3 is useful for higher level, language/locale sensitive text No, the Unicode grapheme clustering algorithm is not locale or language sensitive http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries one

Re: New full Unicode for ES6 idea

2012-03-02 Thread Brendan Eich
Glenn Adams wrote: On Fri, Mar 2, 2012 at 2:13 AM, Erik Corry erik.co...@gmail.com mailto:erik.co...@gmail.com wrote: level 3 is useful for higher level, language/locale sensitive text No, the Unicode grapheme clustering algorithm is not locale or language sensitive

Re: New full Unicode for ES6 idea

2012-03-01 Thread Erik Corry
I'm not in favour of big red switches, and I don't think the compartment based solution is going to be workable. I'd like to plead for a solution rather like the one Java has, where strings are sequences of UTF-16 codes and there are specialized ways to iterate over them. Looking at this entry

Re: New full Unicode for ES6 idea

2012-03-01 Thread Erik Corry
2012/2/22 Norbert Lindenberg ecmascr...@norbertlindenberg.com: I'll reply to Brendan's proposal in two parts: first about the goals for supplementary character support, second about the BRS. Full 21-bit Unicode support means all of: * indexing by characters, not uint16 storage units; *

Re: New full Unicode for ES6 idea

2012-03-01 Thread Norbert Lindenberg
Comments: 1) In terms of the prioritization I suggested a few days ago https://mail.mozilla.org/pipermail/es-discuss/2012-February/020721.html it seems you're considering item 6 essential, item 1 a side effect (whose consequences are not mentioned - see below), items 2-5 nice to have. Do I

Re: New full Unicode for ES6 idea

2012-03-01 Thread Erik Corry
2012/3/1 Glenn Adams gl...@skynav.com: 2012/3/1 Erik Corry erik.co...@gmail.com I'm not in favour of big red switches, and I don't think the compartment based solution is going to be workable. I'd like to plead for a solution rather like the one Java has, where strings are sequences of

Re: New full Unicode for ES6 idea

2012-02-29 Thread Allen Wirfs-Brock
I posted a new stawman that describes what I think should is that most minimal support that we must provide for full unicode in ES.next: http://wiki.ecmascript.org/doku.php?id=strawman:full_unicode_source_code I'm not suggesting that we must stop at this level of support, but I think not

Re: New full Unicode for ES6 idea

2012-02-28 Thread Brendan Eich
Wes Garland wrote: If four-byte escapes are statically rejected in BRS-on, we have a problem -- we should be able to use old code that runs in either mode unchanged when said code only uses characters in the BMP. We've been over this and I conceded to Allen that four-byte escapes (I'll use

Re: New full Unicode for ES6 idea

2012-02-24 Thread Brendan Eich
Norbert Lindenberg wrote: OK - migrations are hard. But so far most participants have only seen additional work, no benefits. How long will this take? When will it end? When will browsers make BRS-on the default, let alone eliminate the switch? When can Roozbeh abandon his original version?

Re: New full Unicode for ES6 idea

2012-02-22 Thread Wes Garland
Erratum: var a = [0]; should read var a = []; ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss

Re: New full Unicode for ES6 idea

2012-02-21 Thread Andrew Oakley
On 02/20/12 16:47, Brendan Eich wrote: Andrew Oakley wrote: Issues only arise in code that tries to treat a string as an array of 16-bit integers, and I don't think we should be particularly bothered by performance of code which misuses strings in this fashion (but clearly this should still

Re: New full Unicode for ES6 idea

2012-02-21 Thread Wes Garland
On 21 February 2012 00:03, Brendan Eich bren...@mozilla.com wrote: These are byte-based enodings, no? What is the problem inflating them by zero extension to 16 bits now (or 21 bits in the future)? You can't make an invalid Unicode character from a byte value. One of my examples, GB 18030,

Re: New full Unicode for ES6 idea

2012-02-21 Thread Brendan Eich
Andrew Oakley wrote: On 02/20/12 16:47, Brendan Eich wrote: Andrew Oakley wrote: Issues only arise in code that tries to treat a string as an array of 16-bit integers, and I don't think we should be particularly bothered by performance of code which misuses strings in this fashion (but

Re: New full Unicode for ES6 idea

2012-02-21 Thread Brendan Eich
Brendan Eich wrote: in open-source browsers and JS engines that use uint16 vectors internally Sorry, that reads badly. All I meant is that I can't tell what closed-source engines do, not that they do not comply with ECMA-262 combined with other web standards to have the same observable

RE: New full Unicode for ES6 idea

2012-02-21 Thread Phillips, Addison
Normalization happens to source upstream of the JS engine. Here I'll call on a designated Unicode hitter. ;-) I agree that Unicode Normalization shouldn't happen automagically in the JS engine. I rather doubt that normalization happens to source upstream of the JS engine, unless by

Re: New full Unicode for ES6 idea

2012-02-21 Thread Brendan Eich
Phillips, Addison wrote: Normalization happens to source upstream of the JS engine. Here I'll call on a designated Unicode hitter. ;-) I agree that Unicode Normalization shouldn't happen automagically in the JS engine. I rather doubt that normalization happens to source upstream of the JS

RE: New full Unicode for ES6 idea

2012-02-21 Thread Phillips, Addison
I meant ECMA-262 punts source normalization upstream in the spec pipeline that runs parallel to the browser's loading-the-URL | processing-what-was- loaded pipeline. ECMA-262 is concerned only with its little slice of processing heaven. Yep. One of the problems is that the source script

RE: New full Unicode for ES6 idea

2012-02-21 Thread Phillips, Addison
Because it has always been possible, it’s difficult to say how many scripts have transported byte-oriented data by “punning” the data into strings. Actually, I think this is more likely to be truly binary data rather than text in some non-Unicode character encoding, but anything is possible, I

Re: New full Unicode for ES6 idea

2012-02-21 Thread Brendan Eich
Phillips, Addison wrote: Because it has always been possible, it’s difficult to say how many scripts have transported byte-oriented data by “punning” the data into strings. Actually, I think this is more likely to be truly binary data rather than text in some non-Unicode character encoding,

Re: New full Unicode for ES6 idea

2012-02-21 Thread Allen Wirfs-Brock
On Feb 21, 2012, at 7:37 AM, Brendan Eich wrote: Brendan Eich wrote: in open-source browsers and JS engines that use uint16 vectors internally Sorry, that reads badly. All I meant is that I can't tell what closed-source engines do, not that they do not comply with ECMA-262 combined with

Re: New full Unicode for ES6 idea

2012-02-21 Thread Tab Atkins Jr.
On Tue, Feb 21, 2012 at 3:11 PM, Brendan Eich bren...@mozilla.com wrote: Hi Mark, thanks for this post. Mark Davis ☕ wrote: UTF-8 represents a code point as 1-4 8-bit code units 1-6. ... Lock up your encoders, I am so not a Unicode guru but this is what my reptile coder brain remembers.

RE: New full Unicode for ES6 idea

2012-02-21 Thread Phillips, Addison
Hi Mark, thanks for this post. Mark Davis ☕ wrote: UTF-8 represents a code point as 1-4 8-bit code units 1-6. No. 1 to *4*. Five and six byte UTF-8 sequences are illegal and invalid. UTF-16 represents a code point as 2 or 4 16-bit code units 1 or 2. Yes, 1 or 2 16-bit code

Re: New full Unicode for ES6 idea

2012-02-21 Thread Brendan Eich
Thanks, all! That's a relief to know, six bytes always seemed to long but my reptile coder brain was also reptile-coder-lazy and I never dug into it. /be Phillips, Addison wrote: Hi Mark, thanks for this post. Mark Davis ☕ wrote: UTF-8 represents a code point as 1-4 8-bit code units 1-6.

Re: New full Unicode for ES6 idea

2012-02-21 Thread Norbert Lindenberg
I'll reply to Brendan's proposal in two parts: first about the goals for supplementary character support, second about the BRS. Full 21-bit Unicode support means all of: * indexing by characters, not uint16 storage units; * counting length as one greater than the last index; and *

Re: New full Unicode for ES6 idea

2012-02-21 Thread Brendan Eich
On Feb 21, 2012, at 6:05 PM, Norbert Lindenberg ecmascr...@norbertlindenberg.com wrote: I'll reply to Brendan's proposal in two parts: first about the goals for supplementary character support, second about the BRS. Full 21-bit Unicode support means all of: * indexing by characters, not

Re: New full Unicode for ES6 idea

2012-02-21 Thread Norbert Lindenberg
Second part: the BRS. I'm wondering how development and deployment of existing full-Unicode software will play out in the presence of a Big Red Switch. Maybe I'm blind and there are ways to simplify the process, but this is how I imagine it. Let's start with a bit of code that currently

Re: New full Unicode for ES6 idea

2012-02-20 Thread Wes Garland
On 20 February 2012 00:45, Allen Wirfs-Brock al...@wirfs-brock.com wrote: 2) Allow invalid unicode characters in strings, and preserve them over concatenation – (\uD800 + \uDC00).length == 2. I think 2) is the only reasonable alternative. I think so, too -- especially as any sequence of

Re: New full Unicode for ES6 idea

2012-02-20 Thread Wes Garland
On 19 February 2012 16:34, Brendan Eich bren...@mozilla.com wrote: Wes Garland wrote: Is there a proposal for interaction with JSON? From http://www.ietf.org/rfc/rfc4627, 2.5 *snip* - so the proposal is to keep encoding JSON in UTF-16. What happens if the BRS is set to Unicode and we

Re: New full Unicode for ES6 idea

2012-02-20 Thread Wes Garland
On 20 February 2012 09:56, Andrew Oakley and...@ado.is-a-geek.net wrote: While this is being discussed, for any new string handling I think we should make any invalid strings (according to the rules in Unicode) cause some kind of exception on creation. Can you clarify which definition in

Re: New full Unicode for ES6 idea

2012-02-20 Thread Brendan Eich
Allen Wirfs-Brock wrote: Last year we dispensed with the binary data hacking in strings use-case. I don't see the hardship. But rather than throw exceptions on concatenation I would simply eliminate the ability to spell code units with \u escapes. Who's with me? I think we need to be

Re: New full Unicode for ES6 idea

2012-02-20 Thread Brendan Eich
Gavin Barraclough wrote: What it might do, however, is eliminate the ambiguity about the intended meaning of \uD800\uDc00 in legacy code. If full unicode string mode only supported \u{} escapes then existing code that uses \u would have to be updated before it could be used in that

Re: New full Unicode for ES6 idea

2012-02-20 Thread Brendan Eich
Andrew Oakley wrote: Issues only arise in code that tries to treat a string as an array of 16-bit integers, and I don't think we should be particularly bothered by performance of code which misuses strings in this fashion (but clearly this should still work without opt-in to new string

Re: New full Unicode for ES6 idea

2012-02-20 Thread Brendan Eich
Allen Wirfs-Brock wrote: For the moment, I'll simply take Wes' word for the above, as it logically makes sense. For some uses, you want to process all possible code points (for example, when validating data from an external source). At this lowest level you don't want to impose higher level

Re: New full Unicode for ES6 idea

2012-02-20 Thread Brendan Eich
Allen Wirfs-Brock wrote: I really don't think any Unicode semantics should be build into the basic string representation. We need to decide on a max element size and Unicode motivates 21 bits, but it could be 32-bits. Personally, I've lived through enough address space exhaustion episodes in

Re: New full Unicode for ES6 idea

2012-02-20 Thread Allen Wirfs-Brock
On Feb 20, 2012, at 10:52 AM, Brendan Eich wrote: Allen Wirfs-Brock wrote: ... Another way to express what I see as the problem with what you are proposing about imposing such string semantics: Could the revised ECMAScript be used to implement a language that had similar but not

Re: New full Unicode for ES6 idea

2012-02-20 Thread Gavin Barraclough
On Feb 20, 2012, at 8:37 AM, Brendan Eich wrote: BRS makes 21-bit chars, so just as String.prototype.charCodeAt returns a code point, String.fromCharCode takes actual code point arguments. Again I'd reject (dynamically in the case of String.fromCharCode) any in [0xd800, 0xdfff]. Other code

Re: New full Unicode for ES6 idea

2012-02-20 Thread Brendan Eich
Allen Wirfs-Brock wrote: On Feb 20, 2012, at 10:52 AM, Brendan Eich wrote: Allen Wirfs-Brock wrote: ... Another way to express what I see as the problem with what you are proposing about imposing such string semantics: Could the revised ECMAScript be used to implement a language that had

Re: New full Unicode for ES6 idea

2012-02-20 Thread Allen Wirfs-Brock
On Feb 20, 2012, at 12:32 PM, Brendan Eich wrote: Allen Wirfs-Brock wrote: ... You are essentially saying that a compiler targeting ES for a language X that includes a string data type that does not confirm to your rules (for example, by allowing occurrences of surrogate code points

Re: New full Unicode for ES6 idea

2012-02-20 Thread Wes Garland
On 20 February 2012 16:00, Allen Wirfs-Brock al...@wirfs-brock.com wrote: My sense is that there are a fairly large variety of string data types could be use the existing ES5 string type as a target type and for which many of the String.prototuype.* methods would function just fine The

Re: New full Unicode for ES6 idea

2012-02-20 Thread Brendan Eich
Allen Wirfs-Brock wrote: On Feb 20, 2012, at 12:32 PM, Brendan Eich wrote: Allen Wirfs-Brock wrote: ... You are essentially saying that a compiler targeting ES for a language X that includes a string data type that does not confirm to your rules (for example, by allowing occurrences of

Re: New full Unicode for ES6 idea

2012-02-20 Thread Norbert Lindenberg
As Brendan's link indicates, JSON is specified by RFC 4627, not by the ECMAScript Language Specification. JSON is widely used for data exchange with and between systems that have nothing to do with ECMAScript and the proposed BRS - see the middle section of http://www.json.org/ So the only

Re: New full Unicode for ES6 idea

2012-02-20 Thread Allen Wirfs-Brock
On Feb 20, 2012, at 3:14 PM, Brendan Eich wrote: Allen Wirfs-Brock wrote: On Feb 20, 2012, at 12:32 PM, Brendan Eich wrote: Allen Wirfs-Brock wrote: ... You are essentially saying that a compiler targeting ES for a language X that includes a string data type that does not confirm to

Re: New full Unicode for ES6 idea

2012-02-20 Thread Allen Wirfs-Brock
On Feb 20, 2012, at 1:42 PM, Wes Garland wrote: On 20 February 2012 16:00, Allen Wirfs-Brock al...@wirfs-brock.com wrote: ... Observation -- disallowing otherwise legal Unicode strings because they contain code points d800-dfff has very concrete implementation benefits: it's possible to

Re: New full Unicode for ES6 idea

2012-02-20 Thread Brendan Eich
Allen Wirfs-Brock wrote: On Feb 20, 2012, at 3:14 PM, Brendan Eich wrote: Note that the above say invalid Unicode code point. 0xd800 is a valid Unicode code point. It isn't a valid Unicode characters. See

Re: New full Unicode for ES6 idea

2012-02-19 Thread Jussi Kalliokoski
I'm not sure what to think about this, being a big fan of the UTF-8 simplicity. :) But anyhow, I like the idea of opt-in, actually so much that I started thinking, why not make JS be encoding-agnostic? What I mean here is that maybe we could have multi-charset Strings in JS? This would be useful

Re: New full Unicode for ES6 idea

2012-02-19 Thread Axel Rauschmayer
On Feb 19, 2012, at 9:33 , Brendan Eich wrote: Instead of any such *big* new observables, I propose a so-called Big Red [opt-in] Switch (BRS) on the side of a unit of VM isolation: specifically the global object. es-discuss-only idea: could that BRS be made to carry more weight? Could it

Re: New full Unicode for ES6 idea

2012-02-19 Thread Peter van der Zee
Do we know how many scripts actually rely on \u15 to produce a stringth length of 3? Might it make more sense to put the new unicode escape under a different escape? Something like \e for extended unicode for example. Or is this acceptable migration tax... On a side note, if we're going to do

Re: New full Unicode for ES6 idea

2012-02-19 Thread Mathias Bynens
On a side note, if we're going to do this, can we also have aliasses in regex to parse certain unicode categories? For instance, the es spec defines the Uppercase Letter (Lu), Lowercase Letter (Ll), Titlecase letter (Lt), Modifier letter (Lm), Other letter (Lo), Letter number (Nl),

Re: New full Unicode for ES6 idea

2012-02-19 Thread Mark S. Miller
On Sun, Feb 19, 2012 at 12:33 AM, Brendan Eich bren...@mozilla.com wrote: [...] Why the global object? Because for many VMs, each global has its own heap or sub-heap (compartment), and all references outside that heap are to local proxies that copy from, or in the case of immutable data,

Re: New full Unicode for ES6 idea

2012-02-19 Thread Lasse Reichstein
On Sun, Feb 19, 2012 at 12:12 PM, Mark S. Miller erig...@google.com wrote: On Sun, Feb 19, 2012 at 12:33 AM, Brendan Eich bren...@mozilla.com wrote: [...] Why the global object? Because for many VMs, each global has its own heap or sub-heap (compartment), and all references outside that heap

Re: New full Unicode for ES6 idea

2012-02-19 Thread Wes Garland
On 19 February 2012 03:33, Brendan Eich bren...@mozilla.com wrote: S1 dates from when Unicode fit in 16 bits, and in those days, nickels had pictures of bumblebees on 'em (Gimme five bees for a quarter, you'd say ;-). Say, is that an onion on your belt? * indexing by characters, not

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Jussi Kalliokoski wrote: I'm not sure what to think about this, being a big fan of the UTF-8 simplicity. :) UTF-8 is great, but it's a transfer format, perfect for C and other such systems languages (especially ones that use byte-wide char from old days). It is not appropriate for JS, which

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Axel Rauschmayer wrote: es-discuss-only idea: could that BRS be made to carry more weight? Could it be a switch for all breaking ES.next changes? What do you have in mind? It had better be important. We *just* had the breakthrough championed by dherman for One JavaScript. Why make trouble by

Re: New full Unicode for ES6 idea

2012-02-19 Thread Axel Rauschmayer
es-discuss-only idea: could that BRS be made to carry more weight? Could it be a switch for all breaking ES.next changes? What do you have in mind? It had better be important. We *just* had the breakthrough championed by dherman for One JavaScript. Why make trouble by adding runtime

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Axel Rauschmayer wrote: es-discuss-only idea: could that BRS be made to carry more weight? Could it be a switch for all breaking ES.next changes? What do you have in mind? It had better be important. We *just* had the breakthrough championed by dherman for One JavaScript. Why make trouble by

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Brendan Eich wrote: My R2 resolution is not specific to any engine, but I have hopes it can be accepted. It is concrete enough to help overcome large-yet-vague doubts about implementation impact (at least IMHO). Recall that document.domain setting may have to split a merged same-origin

Re: New full Unicode for ES6 idea

2012-02-19 Thread David Bruant
Le 19/02/2012 09:33, Brendan Eich a écrit : (...) How is the BRS configured? Again, not via a pragma, and not by imperative state update inside the language (mutating hidden BRS state at a given program point could leave strings created before mutation observably different from those created

Re: New full Unicode for ES6 idea

2012-02-19 Thread Mark S. Miller
On Sun, Feb 19, 2012 at 11:49 AM, Brendan Eich bren...@mozilla.com wrote: [...] Not all engines mediate cross-same-origin-window accesses. I hear IE9+ may, indeed rumor is it remotes to another process sometimes (breaking run-to-completion a bit; something we should explore breaking in the

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Wes Garland wrote: Is there a proposal for interaction with JSON? From http://www.ietf.org/rfc/rfc4627, 2.5: To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair.

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Mark S. Miller wrote: On Sun, Feb 19, 2012 at 11:49 AM, Brendan Eich bren...@mozilla.com mailto:bren...@mozilla.com wrote: [...] Not all engines mediate cross-same-origin-window accesses. I hear IE9+ may, indeed rumor is it remotes to another process sometimes (breaking

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
David Bruant wrote: Le 19/02/2012 09:33, Brendan Eich a écrit : (...) How is the BRS configured? Again, not via a pragma, and not by imperative state update inside the language (mutating hidden BRS state at a given program point could leave strings created before mutation observably

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Brendan Eich wrote: the big red button was the button Elmer Fudd warned Daffy Duck never to press in Design for Leaving: http://www.youtube.com/watch?v=gms_NKzNLUs Got Elmer and Daffy reversed there --getting old! /be ___ es-discuss mailing list

RE: New full Unicode for ES6 idea

2012-02-19 Thread Phillips, Addison
/JavaScriptInternationalization -Original Message- From: Brendan Eich [mailto:bren...@mozilla.com] Sent: Sunday, February 19, 2012 1:34 PM To: Wes Garland Cc: es-discuss; public-script-co...@w3.org; mran...@voxer.com Subject: Re: New full Unicode for ES6 idea Wes Garland wrote: Is there a proposal

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Anne van Kesteren wrote: On Sun, 19 Feb 2012 21:29:48 +0100, David Bruant bruan...@gmail.com wrote: I think a CSP-like solution should be explored. FWIW, the feedback on CORS (CSP-like) thus far has been that it's quite hard to set up custom headers. I've heard this for years, can believe

Re: New full Unicode for ES6 idea

2012-02-19 Thread David Bruant
Le 19/02/2012 22:57, Anne van Kesteren a écrit : On Sun, 19 Feb 2012 21:29:48 +0100, David Bruant bruan...@gmail.com wrote: I think a CSP-like solution should be explored. FWIW, the feedback on CORS (CSP-like) thus far has been that it's quite hard to set up custom headers. Do you have

Re: New full Unicode for ES6 idea

2012-02-19 Thread Boris Zbarsky
On 2/19/12 3:31 PM, Mark S. Miller wrote: Other than the origin truncation issue that I am still confused about, what other benefits are there to mediating interframe access within the same origin? In Gecko's case, at least, there are certain benefits to garbage collection, memory locality,

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Phillips, Addison wrote: Why would converting the existing UCS-2 support to be UTF-16 not be a good idea? There is nothing intrinsically wrong that I can see with that approach and it would be the most compatible with existing scripts, with no special modes, flags, or interactions. Allen

Re: New full Unicode for ES6 idea

2012-02-19 Thread Allen Wirfs-Brock
On Feb 19, 2012, at 2:15 PM, Brendan Eich wrote: Anne van Kesteren wrote: ... As far as the DOM and Web IDL are concerned, I think we would need two definitions for code unit. One that means 16-bit code unit and one that means Unicode code unit I'm not a Unicode expert but I believe

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Allen Wirfs-Brock wrote: On Feb 19, 2012, at 2:15 PM, Brendan Eich wrote: I'm not a Unicode expert but I believe the latter is called character. Me neither, but I believe the correct term is code point which refers to the full 21-bit code while Unicode character is the logical entity

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Brendan Eich wrote: Mark S. Miller wrote: On Sun, Feb 19, 2012 at 12:33 AM, Brendan Eich bren...@mozilla.com mailto:bren...@mozilla.com wrote: [...] Why the global object? Because for many VMs, each global has its own heap or sub-heap (compartment), and all references outside that

Re: New full Unicode for ES6 idea

2012-02-19 Thread Allen Wirfs-Brock
On Feb 19, 2012, at 2:44 PM, Brendan Eich wrote: Allen Wirfs-Brock wrote: On Feb 19, 2012, at 2:15 PM, Brendan Eich wrote: I'm not a Unicode expert but I believe the latter is called character. Me neither, but I believe the correct term is code point which refers to the full 21-bit code

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Allen Wirfs-Brock wrote: On Feb 19, 2012, at 2:44 PM, Brendan Eich wrote: Thanks. We have a confusing transposition of terms between Unicode and ECMA-262, it seems. Should we fix? The ES5.1 spec.is ok because it always uses (as defined in section 6) the term Unicode character when it

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Trimming to es-discuss. Brendan Eich wrote: How about character element? Element to capture indexing as the means of accessing the thing in question. Or avoid the c-word altogether via string element or string indexed property? Latter's too long but you see what I mean. /be

Re: New full Unicode for ES6 idea

2012-02-19 Thread Allen Wirfs-Brock
On Feb 19, 2012, at 3:18 PM, Brendan Eich wrote: Allen Wirfs-Brock wrote: ... Your proposal would change that equivalence. In one sense, the BSR would be a switch that controls whether a ES character corresponds to code unit or a code point Yes, and we might rather have a different

Re: New full Unicode for ES6 idea

2012-02-19 Thread Mark S. Miller
On Sun, Feb 19, 2012 at 1:52 PM, Brendan Eich bren...@mozilla.com wrote: [...] How? By doing a full walk of the object graph and doing surgery on it? This sounds more painful than imposing mediation up front. No, by indirection, of course ;-). The details vary among browsers. I think just

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Mark S. Miller wrote: On Sun, Feb 19, 2012 at 1:52 PM, Brendan Eich bren...@mozilla.com mailto:bren...@mozilla.com wrote: [...] How? By doing a full walk of the object graph and doing surgery on it? This sounds more painful than imposing mediation up front. No, by

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Allen Wirfs-Brock wrote: On Feb 19, 2012, at 3:18 PM, Brendan Eich wrote: Allen Wirfs-Brock wrote: ... Your proposal would change that equivalence. In one sense, the BSR would be a switch that controls whether a ES character corresponds to code unit or a code point Yes, and we might rather

Re: New full Unicode for ES6 idea

2012-02-19 Thread Cameron McCormack
Brendan Eich: To hope to make this sideshow beneficial to all the cc: list, what do DOM specs use to talk about uint16 units vs. code points? I say code unit as a shorter way of saying 16 bit unsigned integer code unit http://dev.w3.org/2006/webapi/WebIDL/#dfn-code-unit (which DOM4 also

Re: New full Unicode for ES6 idea

2012-02-19 Thread Allen Wirfs-Brock
On Feb 19, 2012, at 1:34 PM, Brendan Eich wrote: Wes Garland wrote: Is there a proposal for interaction with JSON? From http://www.ietf.org/rfc/rfc4627, 2.5: To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a

Re: New full Unicode for ES6 idea

2012-02-19 Thread Mark Davis ☕
First, it would be great to get full Unicode support in JS. I know that's been a problem for us at Google. Secondly, while I agree with Addison that the approach that Java took is workable, it does cause problems. Ideally someone would be able to loop (a very common construct) with: for

RE: New full Unicode for ES6 idea

2012-02-19 Thread Phillips, Addison
Mark wrote: First, it would be great to get full Unicode support in JS. I know that's been a problem for us at Google. AP +1: I think we’ve waited for supplementary character support long enough! Secondly, while I agree with Addison that the approach that Java took is workable, it does cause

Re: New full Unicode for ES6 idea

2012-02-19 Thread Gavin Barraclough
On Feb 19, 2012, at 3:13 PM, Allen Wirfs-Brock wrote: My implementor's bias is showing, because I expect many engines would use UTF-16 internally and have non-O(1) indexing for strings with the contains-non-BMP-and-BRS-set-to-full-Unicode flag bit. A fine implementation, but not

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Cameron McCormack wrote: Brendan Eich: To hope to make this sideshow beneficial to all the cc: list, what do DOM specs use to talk about uint16 units vs. code points? I say code unit as a shorter way of saying 16 bit unsigned integer code unit

Re: New full Unicode for ES6 idea

2012-02-19 Thread Brendan Eich
Gavin Barraclough wrote: One way in which the proposal under discussion seems to differ from the previous strawman is in the behavior arising from concatenation of strings ending/beginning with a surrogate hi and lo element. How do we want to handle how do we want to handle unpaired UTF-16

Re: New full Unicode for ES6 idea

2012-02-19 Thread Allen Wirfs-Brock
On Feb 19, 2012, at 6:54 PM, Gavin Barraclough wrote: On Feb 19, 2012, at 3:13 PM, Allen Wirfs-Brock wrote: My implementor's bias is showing, because I expect many engines would use UTF-16 internally and have non-O(1) indexing for strings with the

Re: New full Unicode for ES6 idea

2012-02-19 Thread Bill Frantz
On 2/19/12 at 21:45, al...@wirfs-brock.com (Allen Wirfs-Brock) wrote: I really don't think any Unicode semantics should be build into the basic string representation. We need to decide on a max element size and Unicode motivates 21 bits, but it could be 32-bits. Personally, I've lived

Re: New full Unicode for ES6 idea

2012-02-19 Thread Allen Wirfs-Brock
On Feb 19, 2012, at 7:52 PM, Brendan Eich wrote: Gavin Barraclough wrote: One way in which the proposal under discussion seems to differ from the previous strawman is in the behavior arising from concatenation of strings ending/beginning with a surrogate hi and lo element. How do we want

Re: New full Unicode for ES6 idea

2012-02-19 Thread Gavin Barraclough
On Feb 19, 2012, at 10:05 PM, Allen Wirfs-Brock wrote: Great post. I agree 3 is not good. I was thinking based on today's exchanges that the BRS being set to full Unicode *could* mean that \u is illegal and you *must* use \u{...} to write Unicode *code points* (not code units). Last