On Thu, 7 May 2020, Brent Clark wrote:

Good day Guys

Our good friends are at it again.

https://pastebin.com/raw/vjFcPzLE

I haven't written anything yet.
Thought I would share in the mean time.

100% 4-byte UTF8? That should be trivially easy to detect.

Comments solicited.

  body       __4BYTE_UTF8_WORD     /(?:\xf0\x9d[\x9a-\x9f][\x80-\xff]){3,10}/
  tflags     __4BYTE_UTF8_WORD     multiple, maxhits=10
  meta       SUSP_UTF8_WORD_MANY   __4BYTE_UTF8_WORD > 9

Potential FP for some languages because it's rather broad, it might be possible to narrow it to just the 4-byte math glyphs that render readable English text.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  If you ask amateurs to act as front-line security personnel,
  you shouldn't be surprised when you get amateur security.
                                                    -- Bruce Schneier
-----------------------------------------------------------------------
 Tomorrow: the 75th anniversary of VE day

Reply via email to