Re: [dev-servo] character encoding in the HTML parser

2014-04-03 Thread Luke Wagner
Another option we've just been discussing is to lazily compute a flag on the string indicating "contents are 7-bit ascii" that allowed us to use array indexing. I'd expect this to often be true. There are also many cases where we'd eagerly have this flag (atoms produced during parsing, strings

Re: [dev-servo] character encoding in the HTML parser

2014-04-03 Thread Boris Zbarsky
On 4/3/14 8:03 AM, Henri Sivonen wrote: Have we instrumented Gecko to find out what the access patterns are like? We have not, but I will bet money the answer is "different for benchmarks and actual content"... -Boris ___ dev-servo mailing list dev

Re: [dev-servo] character encoding in the HTML parser

2014-04-03 Thread Henri Sivonen
On Wed, Apr 2, 2014 at 4:25 PM, Robert O'Callahan wrote: > If we could get the JS engine to use evil-UTF8 with some hack to handle > charAt and friends efficiently (e.g. tacking on a UCS-2 version of the > string when necessary) Have we instrumented Gecko to find out what the access patterns are

Re: [dev-servo] character encoding in the HTML parser

2014-04-03 Thread Henri Sivonen
On Tue, Apr 1, 2014 at 12:50 PM, Simon Sapin wrote: > On 01/04/2014 03:01, Keegan McAllister wrote: >> >> It does seem like replacing truly lone surrogates with U+FFFD would >> be an acceptable deviation from the spec, but maybe we want to avoid >> those absolutely. > > As much as I’d like this to