On Thu, 20 May 2021 19:39:06 +0100 RW wrote: > > /\xF0\x9F(?:\x98[\x80-\xBF]|\x99[\x80-\x8F])|xF0\x9F(?:[\xA4-\xA6][\x80-\xBF]|\xA7[\x80-\xBF])|\xE2\x98[\xB9-\xBB]/
This includes the block mentioned by Bill Cole and and is simplified a bit /\xF0\x9F[\x98-\x99\xA4-\xA7\x8C-\x97][\x80-\x8F]|\xE2\x98[\xB9-\xBB]/ However, if you don't expect to get any legitimate mail with Asian languages in the subject, you can probably get away with including all 4-byte UTF-8. Those code points are dominated by CJK, symbols, emojis and dead languages. /[\xF0-\xF7][\x80-\xBF]{3}|\xE2\x98[\xB9-\xBB]/