It sounds as if I didn't ask my question very well. Let me try again.  
I'm going to explain some things as if you didn't know them even  
though you obviously do just to make it clear what I'm asking about.

Every string has an encoding: UCS-2, ASCII, UTF-8, Shift JIS, UTF-16,  
etc. Unicode strings are also either composed or decomposed in one of  
several ways.

ECMA-262 4.3.16 doesn't specify an encoding for JavaScript strings. It  
specifies that strings are arrays of 16-bit integers. It doesn't  
specify semantics for those integers. It says each of these integers  
is "usually" UTF-16 (without suggesting a de/composition) but doesn't  
specify it.

Obviously, V8 is free to do whatever it likes with strings internally  
in order to get its job done. However, a couple of questions remain  
from an interface standpoint:

What encoding and de/composition can JavaScript programs expect? (I  
expect this will be dictated by the expectations of programs such as  
Gmail.)

What encoding and de/composition can clients of v8::String::Write,  
v8::String::ExternalStringResource, and v8::String::Value expect? (I  
expect this will be dictated by the expectations of programs such as  
Chrome.)

I am not a Unicode expert, so I recognize these questions may seem  
silly on some level.


Pete Gontier <http://pete.gontier.org/>



On Oct 4, 2008, at 6:27 AM, Søren Gjesse wrote:

> Inside V8 there is a number of different string representations. The  
> basic ones are ascii representation (AsciiString) and two byte  
> representation (TwoByteString) where the first is used when all  
> characters are ASCII and therefore only one byte is required to  
> store each character. Besides that V8 has concatenated strings  
> (ConsString) and string slices (SlicedString). Concatenated strings  
> points to two other strings which have been concatenated but the  
> concatenated string is not materialized whereas a string slice  
> points to a part of an existing string. V8 tries to make the best  
> choice when making new strings and there are a number of rules to  
> materialize (flatten) concatenated strings when certain operations  
> are preformed. Finally there are also external strings in ascii and  
> two byte variants (ExternalAsciiString and ExternalTwoByteString)  
> these are strings which are not present in the V8 heap but  
> references to strings in C++ land added through the API. In Chrome  
> external strings are used when adding the JavaScript source code  
> from web pages to V8 without making an additional copy.
>
> Regards,
> Søren
>
> On Sat, Oct 4, 2008 at 3:29 AM, Pete Gontier <[EMAIL PROTECTED]> wrote:
> ECMA-262 4.3.16 allows a fair amount of encoding flexibility.
>
> Has V8 committed to any particular encoding?
>
>
> Pete Gontier <http://pete.gontier.org/>
>
>
>
> On Oct 2, 2008, at 11:59 PM, Søren Gjesse wrote:
>
>> There is only one String type in V8 which is v8::String. You can  
>> create an new String in a number of ways with v8::String::New most  
>> commonly used. The classes  v8::String::Utf8Value and  
>> v8::String::Value (and v8::String::AsciiValue which is mainly for  
>> testing) are used to pull out the string as a char* or uint16_t* to  
>> be used in C++, e.g.:
>>
>>   v8::Handle<v8::String> str = v8::String::New("print")
>>   v8::String::Utf8Value s(str);
>>   printf("%s", *s);
>>
>> Note that v8::String represents the string value (ECMA-262 4.3.16).  
>> To create a string object (ECMA-262 4.3.18) use NewInstance on the  
>> String function.
>>
>> Regards,
>> Søren
>>
>>
>> On Thu, Oct 2, 2008 at 11:00 PM, ondras <[EMAIL PROTECTED]>  
>> wrote:
>>
>> Hi again,
>>
>> I have some troubles understanding all those String types in V8. What
>> exactly is the purpose and difference between v8::String::New,
>> v8::String::AsciiValue and v8::String::Utf8Value? How should I use
>> these and when?
>>
>> Thanks for clarification,
>> Ondrej
>>
>>
>>
>>
>>
>>
>
>
>
>
>
> >


--~--~---------~--~----~------------~-------~--~----~
v8-users mailing list
[email protected]
http://groups.google.com/group/v8-users
-~----------~----~----~----~------~----~------~--~---

Reply via email to