Re: [v8-users] unicode support, nodejs and chrome browser

~flow Thu, 11 Aug 2011 10:05:48 -0700

so i went and put the same javascript into an HTML page to be displayed by 
chrome and into a standalone js snippet to be run using nodejs:


var f = function( text ) {
  document.write( '<h1>',  text,                                '</h1>'  );
  document.write( '<div>', text.length,                         '</div>' );
  document.write( '<div>0x', text.charCodeAt(0).toString( 16 ), '</div>' );
  document.write( '<div>0x', text.charCodeAt(1).toString( 16 ), '</div>' );
  console.log( '<h1>',  text,                                 '</h1>'  );
  console.log( '<div>', text.length,                          '</div>' );
  console.log( '<div>0x', text.charCodeAt(0).toString( 16 ),  '</div>' );
  console.log( '<div>0x', text.charCodeAt(1).toString( 16 ),  '</div>' ); };

f( '𩄎' );
f( String.fromCharCode( 0x2910e ) );
f( String.fromCharCode( 0xd864, 0xdd0e ) );

in function f(), those document.write() calls are only present in the HTML 
document, not the standalone. 

i want to show here that something more fundamental must be different 
between javascript running inside google chrome and javascript running 
inside nodejs. because, you see, the output i get inside chrome looks like 
this:

𩄎
2
0xd864
0xdd0e
鄎
1
0x910e
0xNaN
𩄎
2
0xd864
0xdd0e

the second character is silently truncated (notice how the chr code is 
reported as 0x910e where it should be 0x2910e) which is sad, but both using 
a string literal and a numerical surrogate pair works---both in the HTML 
page and in chrome's console output! conversely, in nodejs, this is what i 
get:

<h1> � </h1>
<div> 1 </div>
<div>0x fffd </div>
<div>0x NaN </div>
<h1> 鄎 </h1>
<div> 1 </div>
<div>0x 910e </div>
<div>0x NaN </div>
<h1> �����</h1>
<div> 2 </div>
<div>0x d864 </div>
<div>0x dd0e </div>

the silver lining here is that v8 inside nodejs does preserve the surrogate 
pair, even though it fails to output it correctly. however, the 
console.log() method gets it completely wrong. may i add that the analog in 
python 3.1 does work---since i use a 'narrow' python build, it also reports 
a string '𩄎' as being two characters long, and manages to print it out 
correctly, which seems to tell me that my ubuntu gnome terminal knows how to 
handle surrogate pairs. 

i could perfectly live with those surrogate pairs---they're a nuisance but i 
know how to deal with them from years of experience with python. the really 
sad thing here is that nodejs's v8 seems to fall short on something that v8 
can be demonstrated to do correctly when running inside chrome. 

that said, let me add that i sometimes worry about the unneeded complexity 
that goes into implementations. why can't people just use a 32bit wide 
character datatypes? instead they make users jump to all kinds of gratuitous 
hoops.

-- 
v8-users mailing list
[email protected]
http://groups.google.com/group/v8-users

Re: [v8-users] unicode support, nodejs and chrome browser

Reply via email to