On Thu, 20 May 2021 19:39:06 +0100
RW wrote:

> 
> /\xF0\x9F(?:\x98[\x80-\xBF]|\x99[\x80-\x8F])|xF0\x9F(?:[\xA4-\xA6][\x80-\xBF]|\xA7[\x80-\xBF])|\xE2\x98[\xB9-\xBB]/


This includes the block mentioned by Bill Cole and and is simplified a
bit


/\xF0\x9F[\x98-\x99\xA4-\xA7\x8C-\x97][\x80-\x8F]|\xE2\x98[\xB9-\xBB]/


However, if you don't expect to get any legitimate mail with Asian
languages in the subject, you can probably get away with including all
4-byte UTF-8. Those code points are dominated by CJK, symbols, emojis
and dead languages.


/[\xF0-\xF7][\x80-\xBF]{3}|\xE2\x98[\xB9-\xBB]/

Reply via email to