On 2021-05-20 at 13:44:43 UTC-0400 (Thu, 20 May 2021 18:44:43 +0100)
RW <rwmailli...@googlemail.com>
is rumored to have said:

On Thu, 20 May 2021 18:30:03 +0100
RW wrote:


Try this:


header  EMOTICON_IN_SUBJECT  Subject =~
/\xF0\x9F(?:\x98[\x80-\xFF]|\x99[\x00-x8F])/


Actually that's only the original block, but it probably works most of
the time

Not so sure about that...

I regularly get mail from Patreon with emoji in the encoded header which don't match that pattern:


# grep '^Subject: ' /tmp/ham |cut -d? -f4 |decode-base64 |hexdump -C
00000000 f0 9f 8e 89 20 50 61 74 72 69 63 6b 20 57 61 72 |.... Patrick War| 00000010 64 6c 65 20 6a 75 73 74 20 73 68 61 72 65 64 20 |dle just shared |
00000020  22 f0 9f 93 9d 20 4e                              |".... N|
00000027

People send wanted mail with all sorts of weirdness.

Looking at the full set (https://www.unicode.org/emoji/charts/full-emoji-list.html) I can understand why \p{Emoticons} would be so much better than trying to define them all in a regex of hex bytes in UTF-8 form.

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Reply via email to