I was spelunking the header just now and ran across some comments which made specific reference to UTF-16, so that's good. It would still be useful to know which de/composition to expect. It might seem needlessly specific, but because others have done it, it's useful to know.
Pete Gontier <http://pete.gontier.org/> On Oct 4, 2008, at 10:53 AM, Pete Gontier wrote: > It sounds as if I didn't ask my question very well. Let me try > again. I'm going to explain some things as if you didn't know them > even though you obviously do just to make it clear what I'm asking > about. > > Every string has an encoding: UCS-2, ASCII, UTF-8, Shift JIS, > UTF-16, etc. Unicode strings are also either composed or decomposed > in one of several ways. > > ECMA-262 4.3.16 doesn't specify an encoding for JavaScript strings. > It specifies that strings are arrays of 16-bit integers. It doesn't > specify semantics for those integers. It says each of these integers > is "usually" UTF-16 (without suggesting a de/composition) but > doesn't specify it. > > Obviously, V8 is free to do whatever it likes with strings > internally in order to get its job done. However, a couple of > questions remain from an interface standpoint: > > What encoding and de/composition can JavaScript programs expect? (I > expect this will be dictated by the expectations of programs such as > Gmail.) > > What encoding and de/composition can clients of v8::String::Write, > v8::String::ExternalStringResource, and v8::String::Value expect? (I > expect this will be dictated by the expectations of programs such as > Chrome.) > > I am not a Unicode expert, so I recognize these questions may seem > silly on some level. > > > Pete Gontier <http://pete.gontier.org/> > > > > On Oct 4, 2008, at 6:27 AM, Søren Gjesse wrote: > >> Inside V8 there is a number of different string representations. >> The basic ones are ascii representation (AsciiString) and two byte >> representation (TwoByteString) where the first is used when all >> characters are ASCII and therefore only one byte is required to >> store each character. Besides that V8 has concatenated strings >> (ConsString) and string slices (SlicedString). Concatenated strings >> points to two other strings which have been concatenated but the >> concatenated string is not materialized whereas a string slice >> points to a part of an existing string. V8 tries to make the best >> choice when making new strings and there are a number of rules to >> materialize (flatten) concatenated strings when certain operations >> are preformed. Finally there are also external strings in ascii and >> two byte variants (ExternalAsciiString and ExternalTwoByteString) >> these are strings which are not present in the V8 heap but >> references to strings in C++ land added through the API. In Chrome >> external strings are used when adding the JavaScript source code >> from web pages to V8 without making an additional copy. >> >> Regards, >> Søren >> >> On Sat, Oct 4, 2008 at 3:29 AM, Pete Gontier <[EMAIL PROTECTED]> >> wrote: >> ECMA-262 4.3.16 allows a fair amount of encoding flexibility. >> >> Has V8 committed to any particular encoding? >> >> >> Pete Gontier <http://pete.gontier.org/> >> >> >> >> On Oct 2, 2008, at 11:59 PM, Søren Gjesse wrote: >> >>> There is only one String type in V8 which is v8::String. You can >>> create an new String in a number of ways with v8::String::New most >>> commonly used. The classes v8::String::Utf8Value and >>> v8::String::Value (and v8::String::AsciiValue which is mainly for >>> testing) are used to pull out the string as a char* or uint16_t* >>> to be used in C++, e.g.: >>> >>> v8::Handle<v8::String> str = v8::String::New("print") >>> v8::String::Utf8Value s(str); >>> printf("%s", *s); >>> >>> Note that v8::String represents the string value (ECMA-262 >>> 4.3.16). To create a string object (ECMA-262 4.3.18) use >>> NewInstance on the String function. >>> >>> Regards, >>> Søren >>> >>> >>> On Thu, Oct 2, 2008 at 11:00 PM, ondras <[EMAIL PROTECTED]> >>> wrote: >>> >>> Hi again, >>> >>> I have some troubles understanding all those String types in V8. >>> What >>> exactly is the purpose and difference between v8::String::New, >>> v8::String::AsciiValue and v8::String::Utf8Value? How should I use >>> these and when? >>> >>> Thanks for clarification, >>> Ondrej >>> >>> >>> >>> >>> >>> >> >> >> >> >> >> >> > --~--~---------~--~----~------------~-------~--~----~ v8-users mailing list [email protected] http://groups.google.com/group/v8-users -~----------~----~----~----~------~----~------~--~---
