RFC 7158 section 7 [1] provides not only the \uXXXX notation for Unicode code 
points in the Basic Multilingual Plane, but also a 12-character sequence 
encoding the UTF-16 surrogate pair (i.e. \uYYYY\uZZZZ with 0xD800 ≤ YYYY < 
0xDC00 ≤ ZZZZ ≤ 0xDFFF) for supplementary Unicode code points. A tool checking 
for escape sequences that don’t correspond to any Unicode character must be 
aware of this, because neither \uYYYY nor \uZZZZ by itself would correspond to 
any Unicode character, but their combination may well do so.

Norbert

[1] https://tools.ietf.org/html/rfc7158#section-7


> On May 7, 2015, at 5:46 , Costello, Roger L. <coste...@mitre.org> wrote:
> 
> Hi Folks,
> 
> The JSON specification says that a character may be escaped using this 
> notation: \uXXXX    (XXXX are four hex digits)
> 
> However, not every four hex digits corresponds to a Unicode character. 
> 
> Are there tools to scan a JSON document to detect the presence of \uXXXX, 
> where XXXX does not correspond to any Unicode character?
> 
> /Roger
> 


Reply via email to