On May 18, 2011, at 3:46 PM, Waldemar Horwat wrote:
On 05/16/11 11:11, Allen Wirfs-Brock wrote:
I tried to post a pointer to this strawman on this list a few weeks ago, but
apparently it didn't reach the list for some reason.
Feed back would be appreciated:
- Java kept their strings encoded exactly as they were (a sequence of
16-bit code units) and provided extra APIs for the cases where you want to
extract a code point.
Bloaty.
? defining UTF-16 instead of UCS-2 introduces zero bloat. In fact, it pretty
much works anyway, it's just not
On May 19, 2011, at 10:27 AM, Shawn Steele wrote:
The crucial win of Allen's proposal comes down the road, when someone in a
certain locale *can* do s.indexOf(nonBMPChar) and win.
s.indexOf(\U+1),
Ok, but \U+... does not work today.
who cares that it ends up as UTF-16? You can
On Thu, May 19, 2011 at 9:50 AM, Brendan Eich bren...@mozilla.com wrote:
[...]
That seems worth considering, rather than s.wideIndexOf(nonBMPChar).
Not jumping into the Unicode debate yet. But I did want to nip this
terminological possibility in the bud. PLEASE do not refer to non-BMP
The crucial win of Allen's proposal comes down the road, when someone in a
certain locale *can* do s.indexOf(nonBMPChar) and win.
s.indexOf(\U+1),
Ok, but \U+... does not work today.
Yes, that would be worth adding (IMO) as a convenience, regardless of whether
the backend were UTF-16
On 11:59 AM, Brendan Eich wrote:
But hey, if JS does not need to change then we can avoid trouble and
keep on using 16-bit indexing and length. Is this really the best outcome?
It may well be. The problem is largely theoretical, and the many offered
cures seem to be much worse than the disease.
On May 19, 2011, at 11:18 AM, Douglas Crockford wrote:
A more critical need is some form of string.format or quasiliterals.
Yes, these are important. On the agenda for next week? Which strawmen? I've had
trouble sorting through the quasi-variations on the wiki, and I know I'm not
alone.
/be
On Thu, May 19, 2011 at 12:05 PM, Brendan Eich bren...@mozilla.com wrote:
On May 19, 2011, at 11:18 AM, Douglas Crockford wrote:
A more critical need is some form of string.format or quasiliterals.
Yes, these are important. On the agenda for next week? Which strawmen? I've
had trouble
On May 18, 2011, at 3:46 PM, Waldemar Horwat wrote:
2. Widening characters to 21 bits doesn't really help much. As stated
earlier in this thread, you still want to treat clumps of combining
characters together with the character to which they combine, worry about
various normalized
will be security problems.
-Shawn
-Original Message-
From: es-discuss-boun...@mozilla.org [mailto:es-discuss-boun...@mozilla.org] On
Behalf Of Allen Wirfs-Brock
Sent: jueves, mayo 19, 2011 12:19 PM
To: Waldemar Horwat
Cc: es-discuss@mozilla.org
Subject: Re: Full Unicode strings strawman
On May 18
On May 19, 2011, at 2:06 PM, Shawn Steele wrote:
There are several sequences in Unicode which are meaningless if you have only
one character and not the other. Eg: any of the variation selectors by
themselves are meaningless. So if you break a modified character from its
variation
, mayo 19, 2011 3:00 PM
To: Shawn Steele
Cc: Waldemar Horwat; es-discuss@mozilla.org; Peter Constable
Subject: Re: Full Unicode strings strawman
On May 19, 2011, at 2:06 PM, Shawn Steele wrote:
There are several sequences in Unicode which are meaningless if you have only
one character
On May 19, 2011, at 3:35 PM, Shawn Steele wrote:
I'm still not at all convinced :) I don't buy that the linguistic case isn't
interesting
Just to be clear, I'm not saying the linguistic case isn't interesting. It's
obviously very interesting for a lot of application. I was trying to say
On Tue, May 17, 2011 at 11:09 AM, Shawn Steele
shawn.ste...@microsoft.comwrote:
I would much prefer changing UCS-2 to UTF-16, thus formalizing that
surrogate pairs are permitted. That'd be very difficult to break any
existing code and would still allow representation of everything reasonable
On Tue, May 17, 2011 at 20:01, Wes Garland w...@page.ca wrote:
Mark;
Are you Dr. *Mark E. Davis* (born September 13, 1952 (age 58)), co-founder
of the Unicode http://en.wikipedia.org/wiki/Unicode project and the
president of the Unicode
Hmm... I proposed break iterators for 'character/grapheme', word, line
and sentence as a part of i18n API, but it's shot down (at least for
version
0.5). Are you open to adding them now ? Once this discussion is settled and
the proposal to support the full unicode range is in place, we can
Yes, one of the options for the internal storage of the string class is to
use different arrays depending on the contents.
1. uint8's if all the codepoint are =FF
2. uint16's if all the codepoint values =
3. uint32's otherwise
That way the internal storage always corresponds
On 05/16/11 11:11, Allen Wirfs-Brock wrote:
I tried to post a pointer to this strawman on this list a few weeks ago, but
apparently it didn't reach the list for some reason.
Feed back would be appreciated:
http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_strings
Allen
I have read the discussion so far, but would like to come back to the
strawman itself because I believe that it starts with a problem
statement that's incorrect and misleading the discussion. Correctly
describing the current situation would help in the discussion of
possible changes, in
On 16 May 2011 17:42, Boris Zbarsky bzbar...@mit.edu wrote:
On 5/16/11 4:38 PM, Wes Garland wrote:
Two great things about strings composed of Unicode code points:
...
If though this is a breaking change from ES-5, I support it
whole-heartedly but I expect breakage to be very limited.
On 5/17/11 10:40 AM, Wes Garland wrote:
On 16 May 2011 17:42, Boris Zbarsky bzbar...@mit.edu
Those aren't code points at all. They're just not Unicode.
Not quite: code points D800-DFFF are reserved code points which are not
representable with UTF-16.
Nor with any other Unicode encoding,
On May 16, 2011, at 8:13 PM, Allen Wirfs-Brock wrote:
I think it does. In another reply I also mentioned the possibility of tagging
in a JS visible manner strings that have gone through a known encoding
process.
Saw that, seems helpful. Want to spec it?
If the strings you are combining
On 5/17/11 1:05 PM, Brendan Eich wrote:
If the strings you are combining from different sources have not been
canonicalize to a common encoding then you better be damn care how you combine
them.
Programmers miss this as you note, so arguably things are not much worse, at
best no worse,
On May 17, 2011, at 10:22 AM, Boris Zbarsky wrote:
Yes. And right now that's how it works and actual JS authors typically don't
have to worry about encoding issues. I don't agree with Allen's claim that
in the long run JS in the browser is going to have to be able to deal with
arbitrary
On 5/17/11 1:27 PM, Brendan Eich wrote:
On May 17, 2011, at 10:22 AM, Boris Zbarsky wrote:
Yes. And right now that's how it works and actual JS authors typically don't have to
worry about encoding issues. I don't agree with Allen's claim that in the long run
JS in the browser is going to
On May 17, 2011, at 10:37 AM, Boris Zbarsky wrote:
On 5/17/11 1:27 PM, Brendan Eich wrote:
On May 17, 2011, at 10:22 AM, Boris Zbarsky wrote:
Yes. And right now that's how it works and actual JS authors typically
don't have to worry about encoding issues. I don't agree with Allen's
On 5/17/11 1:40 PM, Brendan Eich wrote:
On May 17, 2011, at 10:37 AM, Boris Zbarsky wrote:
On 5/17/11 1:27 PM, Brendan Eich wrote:
On May 17, 2011, at 10:22 AM, Boris Zbarsky wrote:
Yes. And right now that's how it works and actual JS authors typically don't have to
worry about encoding
On May 17, 2011, at 10:43 AM, Boris Zbarsky wrote:
On 5/17/11 1:40 PM, Brendan Eich wrote:
Where do you read forcing? Not in the words you cited.
In the substance of having strings in different encodings around at the same
time. If that doesn't force developers to worry about encodings,
On May 17, 2011, at 10:47 AM, Brendan Eich wrote:
On May 17, 2011, at 10:43 AM, Boris Zbarsky wrote:
On 5/17/11 1:40 PM, Brendan Eich wrote:
Where do you read forcing? Not in the words you cited.
In the substance of having strings in different encodings around at the same
time. If that
I would much prefer changing UCS-2 to UTF-16, thus formalizing that
surrogate pairs are permitted. That'd be very difficult to break any existing
code and would still allow representation of everything reasonable in Unicode.
That would enable Unicode, and allow extending string literals and
On 5/17/11 1:47 PM, Brendan Eich wrote:
On May 17, 2011, at 10:43 AM, Boris Zbarsky wrote:
On 5/17/11 1:40 PM, Brendan Eich wrote:
Where do you read forcing? Not in the words you cited.
In the substance of having strings in different encodings around at the same
time. If that doesn't
On 17 May 2011 12:36, Boris Zbarsky bzbar...@mit.edu wrote:
Not quite: code points D800-DFFF are reserved code points which are not
representable with UTF-16.
Nor with any other Unicode encoding, really. They don't represent, on
their own, Unicode characters.
Right - but they are still
On 5/17/11 2:12 PM, Wes Garland wrote:
That said, you can encode these code points with utf-8; for example,
0xdc08 becomes 0xed 0xb0 0x88.
By the same argument, you can encode them in UTF-16. The byte sequence
above is not valid UTF-8. See How do I convert an unpaired UTF-16
surrogate to
On 5/17/11 2:24 PM, Allen Wirfs-Brock wrote:
In the substance of having strings in different encodings around at
the same time. If that doesn't force developers to worry about
encodings, what does, exactly?
This already occurs in JS. For example, the encodeURI function produces
a string whose
...@microsoft.com]
Sent: Tuesday, May 17, 2011 11:09 AM
To: Brendan Eich; Boris Zbarsky
Cc: es-discuss
Subject: RE: Full Unicode strings strawman
I would much prefer changing UCS-2 to UTF-16, thus formalizing that
surrogate pairs are permitted. That'd be very difficult to break any existing
Right - but they are still legitimate code points, and they fill out the
space required to let us treat String as uint16[] when defining the backing
store as something that maps to the set of all Unicode code points.
That said, you can encode these code points with utf-8; for example,
On May 17, 2011, at 12:00 PM, Phillips, Addison wrote:
Note: The W3C Internationalization Core WG published a set of requirements
in this area for consideration by ES some time ago. It lives here:
http://www.w3.org/International/wiki/JavaScriptInternationalization
You might want to
[mailto:al...@wirfs-brock.com]
Sent: Tuesday, May 17, 2011 12:16 PM
To: Phillips, Addison
Cc: Shawn Steele; Brendan Eich; Boris Zbarsky; es-discuss
Subject: Re: Full Unicode strings strawman
On May 17, 2011, at 12:00 PM, Phillips, Addison wrote:
Note: The W3C Internationalization Core WG
On 17 May 2011 14:39, Boris Zbarsky bzbar...@mit.edu wrote:
On 5/17/11 2:12 PM, Wes Garland wrote:
That said, you can encode these code points with utf-8; for example,
0xdc08 becomes 0xed 0xb0 0x88.
By the same argument, you can encode them in UTF-16. The byte sequence
above is not valid
On 17 May 2011 15:00, Phillips, Addison addi...@lab126.com wrote:
2. Allowing unpaired surrogates is a *requirement*. Yes, such a string is
ill-formed, but there are too many cases in which one might wish to have
such broken strings for scripting purposes.
3. We should have escape syntax for
On 5/17/11 3:29 PM, Wes Garland wrote:
But the point remains, the FAQ entry you quote talks about encoding a
lone surrogate, i.e. a code unit, which is not a complete code point.
You can only convert complete code points from one encoding to another.
Just like you can't represent part of a UTF-8
The wrong conclusion is being drawn. I can say definitively that for the
string a\uD800b.
- It is a valid Unicode string, according to the Unicode Standard.
- It cannot be encoded as well-formed in any UTF-x (it is not
'well-formed' in any UTF).
- When it comes to conversion, the bad
On 17 May 2011 16:03, Boris Zbarsky bzbar...@mit.edu wrote:
On 5/17/11 3:29 PM, Wes Garland wrote:
The problem is that UTF-16 cannot represent
all possible code points.
My point is that neither can UTF-8. Can you name an encoding that _can_
represent the surrogate-range codepoints?
On 5/17/11 5:24 PM, Wes Garland wrote:
UTF-8 and UTF-32. I think UTF-7 can, too, but it is not a standard so
it's not really worth discussing. UTF-16 is the odd one out.
That's not what the spec says.
Okay, I think we have to agree to disagree here. I believe my reading of
the spec is
On 17 May 2011 20:09, Boris Zbarsky bzbar...@mit.edu wrote:
On 5/17/11 5:24 PM, Wes Garland wrote:
Okay, I think we have to agree to disagree here. I believe my reading of
the spec is correct.
Sorry, but no... how much more clear can the spec get?
In the past, I have read it thus,
That is incorrect. See below.
Mark
*— Il meglio è l’inimico del bene —*
On Tue, May 17, 2011 at 18:33, Wes Garland w...@page.ca wrote:
On 17 May 2011 20:09, Boris Zbarsky bzbar...@mit.edu wrote:
On 5/17/11 5:24 PM, Wes Garland wrote:
Okay, I think we have to agree to disagree here. I
Mark;
Are you Dr. *Mark E. Davis* (born September 13, 1952 (age 58)), co-founder
of the Unicode http://en.wikipedia.org/wiki/Unicode project and the
president of the Unicode
Consortiumhttp://en.wikipedia.org/wiki/Unicode_Consortiumsince its
incorporation in 1991?
(If so, uh, thanks for giving me
2011/5/16 Allen Wirfs-Brock al...@wirfs-brock.com:
I tried to post a pointer to this strawman on this list a few weeks ago, but
apparently it didn't reach the list for some reason.
Feed back would be appreciated:
http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_strings
Thanks for making a strawman
Unicode Escape Sequences
Is it possible for U+ to accept either 4, 5, or 6 digit sequences? Typically
when I encounter U+ notation the leading zero is omitted, and I see BMP
characters quite often. Obviously BMP could use the U notation, however it
seems like
On May 16, 2011, at 11:30 AM, Mike Samuel wrote:
2011/5/16 Allen Wirfs-Brock al...@wirfs-brock.com:
I tried to post a pointer to this strawman on this list a few weeks ago, but
apparently it didn't reach the list for some reason.
Feed back would be appreciated:
2011/5/16 Shawn Steele shawn.ste...@microsoft.com:
myString.replace( /[\ud800-\udbff](?![\udc00-\u])/g, \ufffd)
.replace( /(^|[^\ud800-\udbff])([\udc00-\ud])/g, \ufffd)
My example code has typos. It should have read
myString.replace( /[\ud800-\udbff](?![\udc00-\udfff])/g,
On May 16, 2011, at 11:34 AM, Shawn Steele wrote:
Thanks for making a strawman
(see my very last sentence below as it may impact the interpreation of some of
the rest of these responses)
Unicode Escape Sequences
Is it possible for U+ to accept either 4, 5, or 6 digit sequences?
On May 16, 2011, at 12:28 PM, Mike Samuel wrote:
DOMString is defined at
http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-C74D1578 thus
Type Definition DOMString
A DOMString is a sequence of 16-bit units.
so how would round tripping a JS string through a DOM string work?
2011/5/16 Allen Wirfs-Brock al...@wirfs-brock.com:
On May 16, 2011, at 12:28 PM, Mike Samuel wrote:
DOMString is defined at
http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-C74D1578 thus
Type Definition DOMString
A DOMString is a sequence of 16-bit units.
so how would
Allen;
Thanks for putting this together. We use Unicode data extensively in both
our web and server-side applications, and being forced to deal with UTF-16
surrogate pair directly -- rather than letting the String implementation
deal with them -- is a constant source of mild pain. At first
On 5/16/11 4:37 PM, Mike Samuel wrote:
You might have. If you reject my assertion about option 2 above, then
to clarify,
The UTF-16 representation of codepoint U+1 is the code-unit pair
U+D8000 U+DC000.
No. The UTF-16 representation of codepoint U+1 is the code-unit
pair 0xD800
2011/5/16 Wes Garland w...@page.ca:
Mike Samuel, can you explain why you are en/decoding UTF-16 when
round-tripping through the DOM?
I was UTF-16 encoding it because there will be host objects in
browsers that assume a UTF-16 encoding and so a possibility for
orphaned surrogates in internal
2011/5/16 Boris Zbarsky bzbar...@mit.edu:
On 5/16/11 4:37 PM, Mike Samuel wrote:
You might have. If you reject my assertion about option 2 above, then
to clarify,
The UTF-16 representation of codepoint U+1 is the code-unit pair
U+D8000 U+DC000.
No. The UTF-16 representation of
I'm quite sympathetic to the goal, but the proposal does represent a
significant breaking change. The problem, as Shawn points out, is with
indexing. Before, the strings were defined as UTF16.
Take a sample string \ud800\udc00\u0061 = \u{1}\u{61}. Right now,
the 'a' (the \u{61}) is at offset
Allen, could you clarify something.
When the strawman says without mentioning codepoint
The String type is the set of all finite ordered sequences of zero or
more 16-bit\b\b\b\b\b\b 21-bit unsigned integer values (“elements”).
does that mean that String.charCodeAt(...) can return any value in
, 2011 12:53 PM
To: Shawn Steele
Cc: es-discuss@mozilla.org
Subject: Re: Full Unicode strings strawman
On May 16, 2011, at 11:34 AM, Shawn Steele wrote:
Thanks for making a strawman
(see my very last sentence below as it may impact the interpreation of some of
the rest of these responses
On Mon, May 16, 2011 at 2:19 PM, Mark Davis ☕ m...@macchiato.com wrote:
I'm quite sympathetic to the goal, but the proposal does represent a
significant breaking change. The problem, as Shawn points out, is with
indexing. Before, the strings were defined as UTF16.
I agree with Mark wrote
: Monday, May 16, 2011 2:24 PM
To: Mark Davis ☕
Cc: Markus Scherer; es-discuss@mozilla.org
Subject: Re: Full Unicode strings strawman
On Mon, May 16, 2011 at 2:19 PM, Mark Davis ☕
m...@macchiato.commailto:m...@macchiato.com wrote:
I'm quite sympathetic to the goal, but the proposal does represent
On 5/16/11 4:38 PM, Wes Garland wrote:
Two great things about strings composed of Unicode code points:
...
If though this is a breaking change from ES-5, I support it
whole-heartedly but I expect breakage to be very limited. Provided
that the implementation does not restrict the storage of
-discuss-boun...@mozilla.org] *On Behalf Of *Jungshik Shin (???, ???)
*Sent:* Monday, May 16, 2011 2:24 PM
*To:* Mark Davis ☕
*Cc:* Markus Scherer; es-discuss@mozilla.org
*Subject:* Re: Full Unicode strings strawman
On Mon, May 16, 2011 at 2:19 PM, Mark Davis ☕ m...@macchiato.com wrote
On 5/16/11 5:16 PM, Mike Samuel wrote:
The strawman says
The String type is the set of all finite ordered sequences of zero or
more 21-bit unsigned integer values (“elements”).
Yeah, that's not the same thing as an actual Unicode string, and
requires handling of all sorts of what if someone
PM
To: Shawn Steele
Cc: Jungshik Shin (신정식, 申政湜); Markus Scherer; es-discuss@mozilla.org
Subject: Re: Full Unicode strings strawman
In terms of implementation capabilities, there isn't really a significant
practical difference between
* a UCS-2 implementation, and
* a UTF-16
On 5/16/11 5:23 PM, Shawn Steele wrote:
I’m having some (ok, a great deal of) confusion between the DOM Encoding
and the JavaScript encoding and whatever. I’d assumed that if I had a
web page in some encoding, that it was converted to UTF-16 (well,
UCS-2), and that’s what the JavaScript engine
On May 16, 2011, at 1:37 PM, Mike Samuel wrote:
2011/5/16 Allen Wirfs-Brock al...@wirfs-brock.com:
...
How would
var oneSupplemental = \U0001;
I don't think I understand you literal notation. \U is a 32-bit character
value? I whose implementation?
Sorry, please read this
On May 16, 2011, at 1:38 PM, Wes Garland wrote:
Allen;
Thanks for putting this together. We use Unicode data extensively in both
our web and server-side applications, and being forced to deal with UTF-16
surrogate pair directly -- rather than letting the String implementation deal
A correction.
U+D800 is indeed a code point: http://www.unicode.org/glossary/#Code_Point. It
is defined for usage in Unicode Strings (see
http://www.unicode.org/glossary/#Unicode_String) because often it is useful
for implementations to be able to allow it in processing.
It does, however, have a
In practice, the supplemental code points don't really cause problems in
Unicode strings. Most implementations just treat them as if they were
unassigned. The only important issue is that *when* they are converted to
UTF-xx for storage or transmission, they need to be handled; typically by
On May 16, 2011, at 2:16 PM, Mike Samuel wrote:
2011/5/16 Boris Zbarsky bzbar...@mit.edu:
On 5/16/11 4:37 PM, Mike Samuel wrote:
There is no Unicode codepoint U+D800 or U+DC00. See
http://www.unicode.org/charts/PDF/UD800.pdf and
http://www.unicode.org/charts/PDF/UDC00.pdf which
...@mozilla.org [mailto:es-discuss-boun...@mozilla.org] On
Behalf Of Allen Wirfs-Brock
Sent: Monday, May 16, 2011 3:18 PM
To: Mark Davis ☕
Cc: Markus Scherer; es-discuss@mozilla.org
Subject: Re: Full Unicode strings strawman
On May 16, 2011, at 2:19 PM, Mark Davis ☕ wrote:
I'm quite sympathetic to the goal
See the section of the proposal about String.prototype.charCodeAt
On May 16, 2011, at 2:20 PM, Mike Samuel wrote:
Allen, could you clarify something.
When the strawman says without mentioning codepoint
The String type is the set of all finite ordered sequences of zero or
more
2011/5/16 Allen Wirfs-Brock al...@wirfs-brock.com:
I think you have an extra 0 at a couple of places above...
Yep. Sorry. The 0x1 really is supposed to be five digits though.
A DOMstring is defined by the DOM spec. to consists of 16-bit elements that
are to be interpreted as a UTF-16
On May 16, 2011, at 2:42 PM, Boris Zbarsky wrote:
On 5/16/11 4:38 PM, Wes Garland wrote:
Two great things about strings composed of Unicode code points:
...
If though this is a breaking change from ES-5, I support it
whole-heartedly but I expect breakage to be very limited. Provided
On May 16, 2011, at 3:22 PM, Shawn Steele wrote:
The problem is that “\UD800\UDC00” === “\U+01”. And if the internal
representation is UTF-32, then they’d have to continue to be the same. And
it’s really hard for them to have the same length if one’s 2 code points and
the other’s 1
Not in my proposal! \ud800\udc00=== \u+01 is false in my proposal.
That’s exactly my problem. I think the engine’s (or at least the applications
written in JavaScript) are still UTF-16-centric and that they’ll have d800,
dc00 === 1. For example, if they were different, then d800,
On May 16, 2011, at 3:33 PM, Mike Samuel wrote:
2011/5/16 Allen Wirfs-Brock al...@wirfs-brock.com:
Really?
There is existing code out there that uses particular implementations
for strings.
Should the cost of migrating existing implementations be taken into
account when considering
On May 16, 2011, at 2:07 PM, Boris Zbarsky wrote:
That said, defining JS strings and DOMString differently seems like a recipe
for serious author confusion (e.g. actually using JS strings as the DOMString
binding in ES might be lossy, assigning from JS strings to DOMString might be
lossy,
On May 16, 2011, at 4:21 PM, Shawn Steele wrote:
Not in my proposal! \ud800\udc00=== \u+01 is false in my proposal.
That’s exactly my problem. I think the engine’s (or at least the
applications written in JavaScript) are still UTF-16-centric and that they’ll
have d800, dc00 ===
On May 16, 2011, at 5:06 PM, Brendan Eich wrote:
On May 16, 2011, at 2:07 PM, Boris Zbarsky wrote:
That said, defining JS strings and DOMString differently seems like a recipe
for serious author confusion (e.g. actually using JS strings as the
DOMString binding in ES might be lossy,
I think you'll find that the actual JS engines are currently UCS-2 centric.
The surrounding browser environments are doing the UTF-16 interpretation.
That why you see instead of �� in browser generated display output.
There’s no difference. I wouldn’t call Windows C++ WCHAR “UCS-2”, however
On 5/16/11 6:18 PM, Allen Wirfs-Brock wrote:
It the string is written as \ud800\udc00\u0061 the 'a' will be at
offset 1, even in the new proposal. It would only be at offset 1 if it
was written as \u+01\u+61 (using the literal notation from the
proposal).
Ah, so in the proposal strings
2011/5/16 Allen Wirfs-Brock al...@wirfs-brock.com:
It the string is written as \ud800\udc00\u0061 the 'a' will be at offset
1, even in the new proposal. It would only be at offset 1 if it was written
as \u+01\u+61 (using the literal notation from the proposal).
Under this scheme,
On 5/16/11 7:21 PM, Shawn Steele wrote:
In other words I don’t think you can get the engine to be completely
UTF-32. At least not without declaring a page as being UTF-32.
For what it's worth, HTML5 does not support declaring a page as UTF-32
at all. We're removing our existing support for
On May 16, 2011, at 5:18 PM, Allen Wirfs-Brock wrote:
On May 16, 2011, at 5:06 PM, Brendan Eich wrote:
On May 16, 2011, at 2:07 PM, Boris Zbarsky wrote:
That said, defining JS strings and DOMString differently seems like a
recipe for serious author confusion (e.g. actually using JS
On 5/16/11 10:20 PM, Allen Wirfs-Brock wrote:
That seems like it'll make it very easy to introduce strings that are a mix of
the two via concatenation
Some implementations already use tree structures to represent strings that are
built via concatenation. It would be straight forward to
It already ins't the case that eval(x)===JSON.parse(x). See
http://timelessrepo.com/json-isnt-a-javascript-subset
On May 16, 2011, at 6:51 PM, Mike Samuel wrote:
2011/5/16 Allen Wirfs-Brock al...@wirfs-brock.com:
It the string is written as \ud800\udc00\u0061 the 'a' will be at offset
1,
2011/5/16 Allen Wirfs-Brock al...@wirfs-brock.com:
It already ins't the case that eval(x)===JSON.parse(x).
See http://timelessrepo.com/json-isnt-a-javascript-subset
I'm aware of that hole. That doesn't mean that we should break the
relationship for code that doesn't error out in either.
On May 16, 2011, at 7:22 PM, Boris Zbarsky wrote:
On 5/16/11 10:20 PM, Allen Wirfs-Brock wrote:
That seems like it'll make it very easy to introduce strings that are a mix
of the two via concatenation
Some implementations already use tree structures to represent strings that
are
On May 16, 2011, at 7:53 PM, Mike Samuel wrote:
2011/5/16 Allen Wirfs-Brock al...@wirfs-brock.com:
It already ins't the case that eval(x)===JSON.parse(x).
See http://timelessrepo.com/json-isnt-a-javascript-subset
I'm aware of that hole. That doesn't mean that we should break the
On May 16, 2011, at 7:18 PM, Brendan Eich wrote:
On May 16, 2011, at 5:18 PM, Allen Wirfs-Brock wrote:
On May 16, 2011, at 5:06 PM, Brendan Eich wrote:
On May 16, 2011, at 2:07 PM, Boris Zbarsky wrote:
That said, defining JS strings and DOMString differently seems like a
recipe for
94 matches
Mail list logo