"alan r" <[EMAIL PROTECTED]> writes:

> I am trying to create a filter which matches From: headers that dont
> have a proper email address in them

This is difficult, perhaps even impossible.  Jeffrey Friedl, in
/Mastering Regular Expressions/, gives a regular expression over 5000
characters long to match an email address and it still does not match
all valid addresses.

> (or at least, are missing an '@'.

This is much easier.  <wink>

> So far, I have this:
> 
> headers "^From\:( |[^@\s])*$" [EMAIL PROTECTED]

What you really want is for the RE match to fail if you see an '@'
after the 'From:'.  You might find this RE does the same thing and is
a little simpler:

    "\AFrom:[^@]*\Z"

Anything that's not an '@' includes spaces, so you don't really need
to deal with them separately.  Second, you should be aware that TMDA
uses the re.MULTILINE flag when doing RE searches.  This means that
'^' and '$' match newlines, not the beginning and end of the whole
string.  A From field that looked like this:

From:\n [EMAIL PROTECTED]

would be (incorrectly) matched by your RE, since the search stops at
the end of the first line ('From:') because of the '$' in your RE.
This field, albeit weird, is still a valid From field.  In the actual
message it would look like this.

From:
 [EMAIL PROTECTED]

The '\A' and '\Z' escape codes I suggest above match
beginning-of-string and end-of-string in multiline matches.  Thus the
'@' will be found, the search will fail and the mail will not be
dropped.

> This would be simpler if one could exclude end-of-line inside
> brackets which I dont think is possible. Instead this expression
> just excludes all whitespace characters except for space.

I'm not sure why you want to (not) match a newline -- it seems to
make the RE more complex without any real gain, but you *can* match a
newline inside brackets by specifying '\n'.

> When this works I plan to change the the action to drop. I havent
> had any hits on this rule yet so I'm wondering if it has a bug in
> it.

One reason you might not see many matches (other than '<>') is that
some MTAs assume that an address without a domain is local and they
will append the local domain onto unqualified addresses.  That is, if
they see a From field with just 'joe' in it, they'll actually rewrite
the From field as '[EMAIL PROTECTED]' before deliving it.

qmail doesn't do this and I believe the behavior is configurable with
Postfix (although I could be mis-remembering).  I don't know about
Sendmail or Exim.


Tim
_____________________________________________
tmda-users mailing list ([EMAIL PROTECTED])
http://tmda.net/lists/listinfo/tmda-users

Reply via email to