RE: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences

Shawn Steele via Unicode Mon, 10 Dec 2018 12:15:28 -0800

IMO, trying to do security checks on an encoded string that will be decoded 
later is pretty much guaranteed to miss cases.  Particularly with ISO-2022-JP, 
which has a plethora of variations in how different software/libraries/OS's 
decode it and treat the invalid/edge cases.


I typically encourage security checks on encodings  to be done after the 
translation to Unicode has been done, but that only works if that is the 
Unicode stream itself is being checked.  Eg: a firewall may not decode it the 
same way as the end-recipient of the data.  Which I guess is the point of the 
encoding project, but... nobody can't guarantee that an endpoint conforms to 
any "standard", so from a security perspective, the recommended guidance is 
pretty much moot, secure applications have to consider non-conforming behavior 
of endpoints as well.

Providing a "best practice" or suggestions in a standard is nice, but in 
practice systems are going to have differing interpretations and behaviors. 
Applications can't "depend" on any consistency.  Even if all the standard 
documents agreed, there'd still be legacy implementations that people didn't 
update for whatever reason and other implementations would miss some of the 
subtleties (or less subtle differences) of the standards. 

IMO, all of the "state shifting" encodings should be treated with care by 
software.  There're a lot of ways to encode the same or similar strings in 
different ways, and you never know what kind of validation happened "on the 
other end".  It's pretty much a given that ISO-2022-JP, particularly edge 
cases, are going to be interpreted differently by different applications.  

-Shawn

RE: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences

Reply via email to