On 05/28/2014 11:16 AM, Alex wrote (syntax highlighting added):
> I'm trying to write a body rule that will catch an email exactly
> containing any number of characters up to 15, followed by a URI,
> followed by any number of characters, up to 15. My attempt has failed
> miserably, and hoped someone could help.
> body LOC_SHORT_BODY_URI m{^.{0,15}(https?://.{1,50}).{0,15}$}
>
> This catches pretty much everything and I can't figure out why.
This should catch pretty much any mail with a web link in it. Body
rules don't reliably match start and end of line markers (^ and $), so
you can't use them reliably. You also have no delimiter between the URL
and the following text. For example:
body LOC_SHORT_BODY_URI m{\A.{0,15}(?:https?://\S{1,50})(?!\S).{0,15}\Z}ms
This also improves your efficiency by using a non-capturing group and
(far more importantly) removing the ambiguity between your two ranges
(so there's no need to try every conceivable iteration). I used a
negative look-ahead in order to satisfy a lack of trailing text (rather
than using \s). I also used \A and \Z with /ms in order to better
describe a short email, but again *this may not work reliably due to how
the body is parsed*. This would work slightly better with rawbody, but
it still won't be perfect.
signature.asc
Description: OpenPGP digital signature
