Oh, for heaven's sake:
Code Point. (1) Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF₁₆. (See definition D10 in Section 3.4, Characters and Encoding.) Not all code points are assigned to encoded characters. See code point type. (2) A value, or position, for a character, in any coded character set. Unicode Scalar Value. Any Unicode code point except high-surrogate and low-surrogate code points. In other words, the ranges of integers 0 to D7FF₁₆ and E000₁₆ to 10FFFF₁₆ inclusive. (See definition D76 in Section 3.9, Unicode Encoding Forms.) Source: http://www.unicode.org/glossary/ The only difference between a code point and a scalar value is that "scalar value" excludes the integer values that correspond to surrogates. That's it. And since it is very unlikely that Twitter and others are storing and interchanging loose surrogates, it is truly a distinction without a difference. This has nothing to do with UTF-Anything or Normalization Form Anything. -- Doug Ewell | Thornton, CO, USA http://ewellic.org | @DougEwell

