Re: filtering invalid from: in header

Tim Legant Wed, 09 Oct 2002 14:53:04 -0700

alan r <[EMAIL PROTECTED]> writes:

> On 06 Oct 2002 22:59:08 -0500, Tim Legant
> <[EMAIL PROTECTED]> wrote:
> 
> > What you really want is for the RE match to fail if you see an '@'
> > after the 'From:'.  You might find this RE does the same thing and
> > is a little simpler:
> 
>     "\AFrom:[^@]*\Z"


[...]

> My whole approach was founded on the assumption that the RE is applied to the
> entire header as one big string as opposed to trying to match each line of the
> header individually. (eg, my mail client (Agent) seems to treat header filters
> that way). Is this an incorrect assumption here?

You are absolutely correct.  I double-checked this (and the MULTILINE
flag) before beginning my previous response to you.  Then I got to
thinking about the problems with individual header fields spread over
multiple lines and completely forgot about how the entire header is
structured.  My apologies for wandering down the wrong road.

> Another assumption I am making is that the RE will match anywhere inside the
> string (and that it is a prefix match). eg,
> 
> the RE "from" would match "xxxxxfromyyyyy" 

Yes.

> Thirdly, that the RE [^@] will match a newline.

Also true.

> Based on these assumptions, I was anchoring my RE to the beginning of a line
> to avoid matching "From:" in the middle of another header such as a
> received-by (leaving aside that those froms are probaby not capitalized), and
> anchoring to the end of a line to avoid a runaway search.

Using the standard line anchors ('^' and '$') and not the string
anchors ('\A' and '\Z') is definitely the way to go.

> IOW my RE was designed to match "From: " at the beginning of a line, followed
> by other stuff not including a newline, followed by a newline. 

More specifically, it was designed to match "From:" at the beginning
of a line, followed by either a space or anything that isn't an '@' or
a space, tab, newline, carriage return, formfeed or vertical tab,
followed by a newline.  The all-inclusiveness of the \s escape is why
you had to explicitly list the space.

Which still leaves the problem of multi-line From: fields.  I can't
say that I've seen one, but they are legitimate.  I've thought about
this a bit and I've come up with an RE that solves the problem.
Unfortunately, it has another problem.  It assumes that there will be
whitespace (space or tab) after the ':'.  I've never seen a From:
field where that isn't true, but I do believe that it is legal to
leave it out.

So, if you want to only match single-line From: fields, this variation
on your original should work:

    '^From:[^@\n]*$'

It's explicit about the newline and therefore you don't have to make
space (or tab) a special case.

If you want to match multiple-line From: fields with the restriction
that all From: fields must have whitespace (space, tab or newline)
after the colon, you can use this:

    '^From:[ \t]*(\n[ \t]+[^@\n]+|[ \t]+[^@\n]+)+$'

Sorry again about the misleading previous response.  I have no idea
what I was thinking!


Tim
_____________________________________________
tmda-users mailing list ([EMAIL PROTECTED])
http://tmda.net/lists/listinfo/tmda-users

Re: filtering invalid from: in header

Reply via email to