On Oct 26, 2008, at 12:03 PM, Erik Corry wrote: > the default behavior will be to assume the encoding, UCS-2, which > is guaranteed to be free of surrogate pair subtleties. > > I don't understand what this could mean in practice. If the input > contains only basic plane (16 bit characters) then there is no > difference between UCS-2 and UTF-16. So in this case the flag would > make no difference. If the input contains characters from the 20 > bit space then UCS-2 can't represent them so what will you do with > them if the user specifies UCS-2 but has such characters. I think > throwing them away would be worse than just leaving them in there as > surrogate pairs. I suppose you could throw an exception but that > seems worse too.
I was planning to throw an exception. Seems to me my choice here is between [1] doing nothing and allowing people to encounter subtle bugs in their own code and [2] being an annoying pedantic gatekeeper who forces people to explicitly request a potentially problematic situation. Neither option is perfect; the question is which is less bad. The situation that concerns me most is that a team may write a lot of code which naively assumes JavaScript strings are UCS-2, because the team's native language fits into UCS-2, and maybe the language of their neighbors fits into UCS-2 as well, and by the time they realize their code has subtle problems processing UTF-16 text, their investment in their project is already too substantial to fix the problems, so they are forced, late in the development cycle, to abandon entire markets. The exception would be a big unmistakable warning the very first time they attempt to use input text which doesn't fit into UCS-2 -- perhaps without realizing it -- before the problem has a chance to become tricky to diagnose. Yes, they can explicitly accept UTF-16 to inhibit the exception, but they had better know the rest of their code can actually process it, and they had better understand that they can't expect the built-in string and regexp facilities to help with that. In short, my hope would be that the exception makes it easier to discover earlier that UTF-16 is a huge issue. – Pete Gontier <http://pete.gontier.org/> --~--~---------~--~----~------------~-------~--~----~ v8-users mailing list [email protected] http://groups.google.com/group/v8-users -~----------~----~----~----~------~----~------~--~---
