On Thu, 1 Mar 2018, Jim Idle wrote:
Malicious RTF files take advantage of the fact that Microsoft do not
follow their own RTF spec. Specifically, Word et al only looks for the
opening sequence:
{rt
Thought the spec says it should be:
{rtf1
I don't think that Tika can assume that all RTF users are as broken as
Word is!
I'd be tempted to define a new mimetype of application/x-broken-rtf or
similar, and feed that a lower priority magic for {\rt, with a suitable
comment/explanation. That way, we won't tell people something is an RTF
which isn't, but we can help them spot these problematic files
If you could create a small, broken but non-malicious rtf file, then raise
an enhancement jira + attach, that'd be great!
Nick