2009/3/9 Stephan Beal <[email protected]>: > > On Mon, Mar 9, 2009 at 7:20 PM, Erik Corry <[email protected]> wrote: >>> String::New("....",length_of_data); >> >> If you make an ASCII string then the top bits must be zero so it's not clean. > > Why must it be 8-bit clean? Is that a limitation/feature of the > String() class? Will the String treat it as UTF automatically if it > sees a high bit? (Some API docs would be nice! hint, hint!)
Here's the text from v8.h: * Allocates a new string from either utf-8 encoded or ascii data. * The second parameter 'length' gives the buffer length. * If the data is utf-8 encoded, the caller must * be careful to supply the length parameter. * If it is not given, the function calls * 'strlen' to determine the buffer length, it might be * wrong if 'data' contains a null character. */ static Local<String> New(const char* data, int length = -1); So it will assume that it is UTF-8 if it is not ASCII. Not all binary sequences are valid UTF-8 so you can't use this for binary data. Internally, V8 does not use UTF-8 so this data will be converted to UC16. /** Allocates a new string from utf16 data.*/ static Local<String> New(const uint16_t* data, int length = -1); This one takes 16 bit characters and can represent binary data with no corruption, but the length is in characters, so you can's use it for an odd number of bytes. > > In my case i'm working on an i/o library which of course treats the > data as opaque (void*). If i understand you correctly, if it happens > to read something with a high bit set then the data it passes back to > the caller (via a String insance) is effectively undefined (or, at > least not guaranteed to be the same bits that the input device read)? > Do i need to document that handling data with non-ASCII chars > essentially leads to undefined results? (Not a huge deal, IMO, for JS > code, as i can't imagine people doing much binary i/o with it, but i'd > like to document it if it's not going to work as expected.) Giving binary data to the above New method will result in undefined behaviour. > >> If you make it a UC16 string then it has to have an even byte length. > > Are there any docs on handling 2-byte strings in v8, or is this a > "must be done by implementations using ExternalStringRepresentation" > feature? Could/Should i potentially use ExternalStringRepresentation > as an internal buffer for the data, rather than an External-to-void* > (which can't be dereferenced by the caller)? The external strings must have their data either in ASCII or in UC16. There's no Latin1 and undefined stuff will result if you try. In the case of an external string the actual string data is not on the V8 heap. It is assumed to be immutable too of course since all JS strings are immutable. > >> So the status is that there isn't any good way to store binary data in >> JS at the moment. Of course it is possible to put the data in an >> external object instead. > > That's an idea. Didn't think of that. It'd mean (in my case) buffering > arbitrarily large read buffers, and since v8 doesn't guaranty GC will > ever be called, i don't want to risk it causing an arbitrarily-sized > leak. If the data is on the V8 heap then it won't be collected without a GC either. :) > > :) > > -- > ----- stephan beal > http://wanderinghorse.net/home/stephan/ > > > > -- Erik Corry, Software Engineer Google Denmark ApS. CVR nr. 28 86 69 84 c/o Philip & Partners, 7 Vognmagergade, P.O. Box 2227, DK-1018 Copenhagen K, Denmark. --~--~---------~--~----~------------~-------~--~----~ v8-users mailing list [email protected] http://groups.google.com/group/v8-users -~----------~----~----~----~------~----~------~--~---
