so i went and put the same javascript into an HTML page to be displayed by
chrome and into a standalone js snippet to be run using nodejs:
var f = function( text ) {
document.write( '<h1>', text, '</h1>' );
document.write( '<div>', text.length, '</div>' );
document.write( '<div>0x', text.charCodeAt(0).toString( 16 ), '</div>' );
document.write( '<div>0x', text.charCodeAt(1).toString( 16 ), '</div>' );
console.log( '<h1>', text, '</h1>' );
console.log( '<div>', text.length, '</div>' );
console.log( '<div>0x', text.charCodeAt(0).toString( 16 ), '</div>' );
console.log( '<div>0x', text.charCodeAt(1).toString( 16 ), '</div>' ); };
f( '𩄎' );
f( String.fromCharCode( 0x2910e ) );
f( String.fromCharCode( 0xd864, 0xdd0e ) );
in function f(), those document.write() calls are only present in the HTML
document, not the standalone.
i want to show here that something more fundamental must be different
between javascript running inside google chrome and javascript running
inside nodejs. because, you see, the output i get inside chrome looks like
this:
𩄎
2
0xd864
0xdd0e
鄎
1
0x910e
0xNaN
𩄎
2
0xd864
0xdd0e
the second character is silently truncated (notice how the chr code is
reported as 0x910e where it should be 0x2910e) which is sad, but both using
a string literal and a numerical surrogate pair works---both in the HTML
page and in chrome's console output! conversely, in nodejs, this is what i
get:
<h1> � </h1>
<div> 1 </div>
<div>0x fffd </div>
<div>0x NaN </div>
<h1> 鄎 </h1>
<div> 1 </div>
<div>0x 910e </div>
<div>0x NaN </div>
<h1> �����</h1>
<div> 2 </div>
<div>0x d864 </div>
<div>0x dd0e </div>
the silver lining here is that v8 inside nodejs does preserve the surrogate
pair, even though it fails to output it correctly. however, the
console.log() method gets it completely wrong. may i add that the analog in
python 3.1 does work---since i use a 'narrow' python build, it also reports
a string '𩄎' as being two characters long, and manages to print it out
correctly, which seems to tell me that my ubuntu gnome terminal knows how to
handle surrogate pairs.
i could perfectly live with those surrogate pairs---they're a nuisance but i
know how to deal with them from years of experience with python. the really
sad thing here is that nodejs's v8 seems to fall short on something that v8
can be demonstrated to do correctly when running inside chrome.
that said, let me add that i sometimes worry about the unneeded complexity
that goes into implementations. why can't people just use a 32bit wide
character datatypes? instead they make users jump to all kinds of gratuitous
hoops.
--
v8-users mailing list
[email protected]
http://groups.google.com/group/v8-users