I was spelunking the header just now and ran across some comments  
which made specific reference to UTF-16, so that's good. It would  
still be useful to know which de/composition to expect. It might seem  
needlessly specific, but because others have done it, it's useful to  
know.

Pete Gontier <http://pete.gontier.org/>



On Oct 4, 2008, at 10:53 AM, Pete Gontier wrote:

> It sounds as if I didn't ask my question very well. Let me try  
> again. I'm going to explain some things as if you didn't know them  
> even though you obviously do just to make it clear what I'm asking  
> about.
>
> Every string has an encoding: UCS-2, ASCII, UTF-8, Shift JIS,  
> UTF-16, etc. Unicode strings are also either composed or decomposed  
> in one of several ways.
>
> ECMA-262 4.3.16 doesn't specify an encoding for JavaScript strings.  
> It specifies that strings are arrays of 16-bit integers. It doesn't  
> specify semantics for those integers. It says each of these integers  
> is "usually" UTF-16 (without suggesting a de/composition) but  
> doesn't specify it.
>
> Obviously, V8 is free to do whatever it likes with strings  
> internally in order to get its job done. However, a couple of  
> questions remain from an interface standpoint:
>
> What encoding and de/composition can JavaScript programs expect? (I  
> expect this will be dictated by the expectations of programs such as  
> Gmail.)
>
> What encoding and de/composition can clients of v8::String::Write,  
> v8::String::ExternalStringResource, and v8::String::Value expect? (I  
> expect this will be dictated by the expectations of programs such as  
> Chrome.)
>
> I am not a Unicode expert, so I recognize these questions may seem  
> silly on some level.
>
>
> Pete Gontier <http://pete.gontier.org/>
>
>
>
> On Oct 4, 2008, at 6:27 AM, Søren Gjesse wrote:
>
>> Inside V8 there is a number of different string representations.  
>> The basic ones are ascii representation (AsciiString) and two byte  
>> representation (TwoByteString) where the first is used when all  
>> characters are ASCII and therefore only one byte is required to  
>> store each character. Besides that V8 has concatenated strings  
>> (ConsString) and string slices (SlicedString). Concatenated strings  
>> points to two other strings which have been concatenated but the  
>> concatenated string is not materialized whereas a string slice  
>> points to a part of an existing string. V8 tries to make the best  
>> choice when making new strings and there are a number of rules to  
>> materialize (flatten) concatenated strings when certain operations  
>> are preformed. Finally there are also external strings in ascii and  
>> two byte variants (ExternalAsciiString and ExternalTwoByteString)  
>> these are strings which are not present in the V8 heap but  
>> references to strings in C++ land added through the API. In Chrome  
>> external strings are used when adding the JavaScript source code  
>> from web pages to V8 without making an additional copy.
>>
>> Regards,
>> Søren
>>
>> On Sat, Oct 4, 2008 at 3:29 AM, Pete Gontier <[EMAIL PROTECTED]>  
>> wrote:
>> ECMA-262 4.3.16 allows a fair amount of encoding flexibility.
>>
>> Has V8 committed to any particular encoding?
>>
>>
>> Pete Gontier <http://pete.gontier.org/>
>>
>>
>>
>> On Oct 2, 2008, at 11:59 PM, Søren Gjesse wrote:
>>
>>> There is only one String type in V8 which is v8::String. You can  
>>> create an new String in a number of ways with v8::String::New most  
>>> commonly used. The classes  v8::String::Utf8Value and  
>>> v8::String::Value (and v8::String::AsciiValue which is mainly for  
>>> testing) are used to pull out the string as a char* or uint16_t*  
>>> to be used in C++, e.g.:
>>>
>>>   v8::Handle<v8::String> str = v8::String::New("print")
>>>   v8::String::Utf8Value s(str);
>>>   printf("%s", *s);
>>>
>>> Note that v8::String represents the string value (ECMA-262  
>>> 4.3.16). To create a string object (ECMA-262 4.3.18) use  
>>> NewInstance on the String function.
>>>
>>> Regards,
>>> Søren
>>>
>>>
>>> On Thu, Oct 2, 2008 at 11:00 PM, ondras <[EMAIL PROTECTED]>  
>>> wrote:
>>>
>>> Hi again,
>>>
>>> I have some troubles understanding all those String types in V8.  
>>> What
>>> exactly is the purpose and difference between v8::String::New,
>>> v8::String::AsciiValue and v8::String::Utf8Value? How should I use
>>> these and when?
>>>
>>> Thanks for clarification,
>>> Ondrej
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>> >>
>


--~--~---------~--~----~------------~-------~--~----~
v8-users mailing list
[email protected]
http://groups.google.com/group/v8-users
-~----------~----~----~----~------~----~------~--~---

Reply via email to