' According to ISO 10646-1:2000, sections D.7 and 2.3c, a device receiving UTF-8 shall interpret a "malformed sequence in the same way that it interprets a character that is outside the adopted subset" '
That behaviour is clearly out of date. Unicode added some new standard for security reasons. The text should be rejected instead, OR the malformed UTF-8 should be modified upon loading to make it conforming UTF-8, basically stripping out the bad bytes or replacing the bad bytes.
As long as we don't pass any invalid UTF-8 to client apps/code, and we don't process any invalid UTF-8, we are fine, so modifying the bytes of the UTF8 text before doing anything with it, can in some circumstances work.
-- Theodore H. Smith - Software Developer. http://www.elfdata.com