console.log() and document.write() are not parts of v8. These are host functions and have different implementations in Chrome and in NodeJS. Chrome seems to have a very robust implementation of both, which is aware of surrogate pairs and the target encoding. NodeJS on the other hand fails to respect surrogate pairs.
Your examples don't show too much other than the fact that String.fromCodeCode() will not generate surrogate pairs and therefore can only generate characters with a 16 bit codepoint. You'll see the same results in NodeJS. '𩄎' === String.fromCharCode(0xd864, 0xdd0e) true On Thu, Aug 11, 2011 at 12:05 PM, ~flow <[email protected]> wrote: > so i went and put the same javascript into an HTML page to be displayed by > chrome and into a standalone js snippet to be run using nodejs: > > var f = function( text ) { > document.write( '<h1>', text, '</h1>' ); > document.write( '<div>', text.length, '</div>' ); > document.write( '<div>0x', text.charCodeAt(0).toString( 16 ), '</div>' ); > document.write( '<div>0x', text.charCodeAt(1).toString( 16 ), '</div>' ); > console.log( '<h1>', text, '</h1>' ); > console.log( '<div>', text.length, '</div>' ); > console.log( '<div>0x', text.charCodeAt(0).toString( 16 ), '</div>' ); > console.log( '<div>0x', text.charCodeAt(1).toString( 16 ), '</div>' ); > }; > > f( '𩄎' ); > f( String.fromCharCode( 0x2910e ) ); > f( String.fromCharCode( 0xd864, 0xdd0e ) ); > > in function f(), those document.write() calls are only present in the HTML > document, not the standalone. > > i want to show here that something more fundamental must be different > between javascript running inside google chrome and javascript running > inside nodejs. because, you see, the output i get inside chrome looks like > this: > > 𩄎 > 2 > 0xd864 > 0xdd0e > 鄎 > 1 > 0x910e > 0xNaN > 𩄎 > 2 > 0xd864 > 0xdd0e > > the second character is silently truncated (notice how the chr code is > reported as 0x910e where it should be 0x2910e) which is sad, but both > using a string literal and a numerical surrogate pair works---both in the > HTML page and in chrome's console output! conversely, in nodejs, this is > what i get: > > <h1> � </h1> > <div> 1 </div> > <div>0x fffd </div> > <div>0x NaN </div> > <h1> 鄎 </h1> > <div> 1 </div> > <div>0x 910e </div> > <div>0x NaN </div> > <h1> �����</h1> > <div> 2 </div> > <div>0x d864 </div> > <div>0x dd0e </div> > > the silver lining here is that v8 inside nodejs does preserve the surrogate > pair, even though it fails to output it correctly. however, the > console.log() method gets it completely wrong. may i add that the analog in > python 3.1 does work---since i use a 'narrow' python build, it also reports > a string '𩄎' as being two characters long, and manages to print it out > correctly, which seems to tell me that my ubuntu gnome terminal knows how to > handle surrogate pairs. > > i could perfectly live with those surrogate pairs---they're a nuisance but > i know how to deal with them from years of experience with python. the > really sad thing here is that nodejs's v8 seems to fall short on something > that v8 can be demonstrated to do correctly when running inside chrome. > > that said, let me add that i sometimes worry about the unneeded complexity > that goes into implementations. why can't people just use a 32bit wide > character datatypes? instead they make users jump to all kinds of gratuitous > hoops. > > -- > v8-users mailing list > [email protected] > http://groups.google.com/group/v8-users > -- v8-users mailing list [email protected] http://groups.google.com/group/v8-users
