Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-11-24 Thread Henri Sivonen
On Thu, Oct 9, 2014 at 2:06 PM, Nicholas Nethercote wrote: > On Thu, Oct 9, 2014 at 9:21 PM, Henri Sivonen wrote: >> On Wed, Oct 8, 2014 at 4:13 PM, Jan de Mooij wrote: >> >> Has SpiderMonkey ever been instrumented to find out if most strings >> are even just ASCII? > > There are some measuremen

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-10 Thread Jan de Mooij
On Thu, Oct 9, 2014 at 12:21 PM, Henri Sivonen wrote: > It would be even more tragic to miss the opportunity to use 8-bit code > units for strings in Servo because JS crypto benchmarks use strings. > What chances are there to retire the use of strings-for-crypto in > benchmarking? Such a benchmar

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-09 Thread Nicholas Nethercote
On Thu, Oct 9, 2014 at 9:21 PM, Henri Sivonen wrote: > On Wed, Oct 8, 2014 at 4:13 PM, Jan de Mooij wrote: > > Has SpiderMonkey ever been instrumented to find out if most strings > are even just ASCII? There are some measurements in https://blog.mozilla.org/javascript/2014/07/21/slimmer-and-fast

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-09 Thread Henri Sivonen
On Wed, Oct 8, 2014 at 4:13 PM, Jan de Mooij wrote: > When I added Latin1 to SpiderMonkey, we did consider using UTF8 but it's > complicated. As mentioned, we have to ensure charAt/charCodeAt stay fast > (crypto benchmarks etc rely on this, sadly). It would be even more tragic to miss the opportu

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-08 Thread Cameron Zwarich
On Oct 8, 2014, at 10:48 PM, Robert O'Callahan wrote: > On Tue, Oct 7, 2014 at 6:57 AM, Henri Sivonen wrote: > >> On Mon, Oct 6, 2014 at 8:27 PM, Cameron Zwarich >> wrote: So you’re suggesting Servo could get away with UTF-8 in the DOM? I >> hadn’t considered it. I remove my proposal at t

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-08 Thread Robert O'Callahan
On Tue, Oct 7, 2014 at 6:57 AM, Henri Sivonen wrote: > On Mon, Oct 6, 2014 at 8:27 PM, Cameron Zwarich > wrote: > >> So you’re suggesting Servo could get away with UTF-8 in the DOM? I > hadn’t considered it. I remove my proposal at the start of this thread, I’d > like us to try this instead. > >

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-08 Thread Jan de Mooij
On Tue, Oct 7, 2014 at 3:57 PM, Henri Sivonen wrote: > > UTF-8 strings will mean that we will have to copy all non-7-bit ASCII > strings between the DOM and JS. > > Not if JS stores strings as WTF-8. I think it would be tragic not to > bother to try to make the JS engine use WTF-8 when having the

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-07 Thread Henri Sivonen
On Mon, Oct 6, 2014 at 7:00 PM, Simon Sapin wrote: > On 06/10/14 07:57, Henri Sivonen wrote: >> On Sun, Oct 5, 2014 at 7:26 PM, Simon Sapin wrote: >>> JavaScript strings, however, can. (They are effectively potentially >>> ill-formed UTF-16.) It’s possible (?) that the Web depends on these >>> su

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-06 Thread Cameron Zwarich
On Oct 6, 2014, at 3:49 PM, Boris Zbarsky wrote: >> Is there any particular place where you feel there is tension between the >> goals of memory usage and performance? > > I don't know yet. I mean, for charAt, sure. ;) JS engines have been using ropes for quite some time now, which means tha

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-06 Thread Boris Zbarsky
On 10/6/14, 3:32 PM, Cameron Zwarich wrote: then it wouldn’t be able to use JS_NewExternalString in the places where Gecko is able to use it. Ah, true. Is there any particular place where you feel there is tension between the goals of memory usage and performance? I don't know yet. I mea

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-06 Thread Cameron Zwarich
On Oct 6, 2014, at 2:49 PM, Boris Zbarsky wrote: > On 10/6/14, 10:27 AM, Cameron Zwarich wrote: >> This is an increase in memory usage over all existing engines > > Is it an increase over Gecko? If Servo used UTF-8 strings in the DOM, then it wouldn’t be able to use JS_NewExternalString in the

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-06 Thread Boris Zbarsky
On 10/6/14, 10:27 AM, Cameron Zwarich wrote: This is an increase in memory usage over all existing engines Is it an increase over Gecko? Are we trying to optimize for performance or memory usage here, or both at once? -Boris ___ dev-servo mailing

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-06 Thread Cameron Zwarich
On Oct 6, 2014, at 9:00 AM, Simon Sapin wrote: >> For me, absent evidence, it's much easier to believe that using WTF-8 >> instead of potentially ill-formed UTF-16 would be a win for the JS >> engine than to believe that using WTF-8 instead of UTF-8 in the DOM >> would be a win. > > So you’re su

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-06 Thread Simon Sapin
On 06/10/14 07:57, Henri Sivonen wrote: On Sun, Oct 5, 2014 at 7:26 PM, Simon Sapin wrote: JavaScript strings, however, can. (They are effectively potentially ill-formed UTF-16.) It’s possible (?) that the Web depends on these surrogates being preserved. It's clear that JS programs depend on

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-06 Thread Boris Zbarsky
On 10/6/14, 3:53 AM, Cameron Zwarich wrote: Where do most of the small strings larger than a single character that benefit from the inline small string optimization originate, the DOM or user JS code? That's a good question. When the optimization was added, strings that originated in the DOM

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-06 Thread Cameron Zwarich
On Oct 5, 2014, at 7:49 PM, Boris Zbarsky wrote: > On 10/5/14, 7:51 PM, Cameron Zwarich wrote: >> Are there any plans to eliminate the copies in Gecko? > > No. Measurement showed that in practice the cost of copying short strings, > which most of these are, is very low. For large strings you

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Henri Sivonen
On Sun, Oct 5, 2014 at 7:26 PM, Simon Sapin wrote: > JavaScript strings, however, can. (They are effectively potentially > ill-formed UTF-16.) It’s possible (?) that the Web depends on these > surrogates being preserved. It's clear that JS programs depend on being able to hold unpaired surrogates

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Boris Zbarsky
On 10/5/14, 7:51 PM, Cameron Zwarich wrote: Are there any plans to eliminate the copies in Gecko? No. Measurement showed that in practice the cost of copying short strings, which most of these are, is very low. For large strings you do end up having to copy, but keep in mind that Gecko used

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Cameron Zwarich
On Oct 5, 2014, at 3:08 PM, Boris Zbarsky wrote: > On 10/5/14, 2:27 PM, Cameron Zwarich wrote: >> I am opposed to anything that requires string copies between the DOM and JS > > The only way to do that with SpiderMonkey in its current state is to use > JSString for your string type. You cannot

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Cameron Zwarich
On Oct 5, 2014, at 3:13 PM, Patrick Walton wrote: > On 10/5/14 3:08 PM, Boris Zbarsky wrote: >> On 10/5/14, 2:27 PM, Cameron Zwarich wrote: >>> I am opposed to anything that requires string copies between the DOM >>> and JS >> >> The only way to do that with SpiderMonkey in its current state is

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Patrick Walton
On 10/5/14 3:08 PM, Boris Zbarsky wrote: On 10/5/14, 2:27 PM, Cameron Zwarich wrote: I am opposed to anything that requires string copies between the DOM and JS The only way to do that with SpiderMonkey in its current state is to use JSString for your string type. You cannot safely grab the c

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Boris Zbarsky
On 10/5/14, 2:27 PM, Cameron Zwarich wrote: I am opposed to anything that requires string copies between the DOM and JS The only way to do that with SpiderMonkey in its current state is to use JSString for your string type. You cannot safely grab the chars from a SpiderMonkey string and hold

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Cameron Zwarich
On Oct 5, 2014, at 2:05 PM, Ms2ger wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 10/05/2014 08:27 PM, Cameron Zwarich wrote: >> If JS can’t handle WTF-8 natively, then what’s the benefit of >> using it? I am opposed to anything that requires string copies >> between the DOM and

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Ms2ger
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/05/2014 08:27 PM, Cameron Zwarich wrote: > If JS can’t handle WTF-8 natively, then what’s the benefit of > using it? I am opposed to anything that requires string copies > between the DOM and JS, unless there’s some really great overriding > reas

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Cameron Zwarich
If JS can’t handle WTF-8 natively, then what’s the benefit of using it? I am opposed to anything that requires string copies between the DOM and JS, unless there’s some really great overriding reason. Cameron On Oct 5, 2014, at 9:26 AM, Simon Sapin wrote: > We’ve discussed using UTF-8 interna

[dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Simon Sapin
We’ve discussed using UTF-8 internally for strings in Servo, but well-formed UTF-8 can not represent surrogate code points. JavaScript strings, however, can. (They are effectively potentially ill-formed UTF-16.) It’s possible (?) that the Web depends on these surrogates being preserved. So i