Performance of these filters is indeed something to pay attention to.  I =
have added what I call "inline content filtering" to my own
version of xmail and it is very successful as long as it runs quickly.

When loading the filter expressions at runtime, I separate them into =
those that contain embedded wildcards, and those that are
simple string searches (e.g.,  *enlarge your* is simple, *v[1i]gra* is =
not).  xmail's wildcard matching routine is particularly
un-optimal where there are one or more *'s inside the expression (e.g., =
abc*def doesn't degrade into beginning-of-string and
end-of-string tests).  making this distinction (and trying to use =
expressions of the "simple" variety) improves speed greatly.

things to note:
  - the filter must decode base64.  happily, this does not have to be =
done precisely - it can be attempted for all lines after
'base64' is seen in a header line, for example.  my code attempts base64 =
decoding when it is not really necessary sometimes, but it
is cheaper than analyzing the message structure in detail
  - many undesirable messages require comparison of multiple data lines =
due to line breaks.  this means that as each line is
presented to the "inline filter", it goes through some gyrating:
    -- whitespace is standardized, and leading and trailing white is =
removed
    -- the new line is appended to a "sliding-window"-type buffer.  if =
the data was text, a space is inserted before the new line.
if it was base64-decoded data, the new data is appended without the =
space.  once the buffer contains a configurable number of lines,
excess lines are removed from the front of the buffer.  this means you =
need to keep track of the lines in a way that permits
retrieval of a pointer to the entire multiline buffer AND keeps track of =
where each line ends so that data can drop off the front of
the buffer.  actually, I keep two of these buffers going, one contains =
the message as received and the other contains the message
lines after base64-decoding if any data have been successfully decoded
  - for maximum benefit, comparisons must be performed against the =
multiline data multiple times:
    -- downshifted with whitespace standardized, then
    -- with html comments removed (leaving html tags, so that hrefs etc. =
can be caught), then
    -- with all html tags removed
  - if you debug to a file, keep it open until all data have been =
presented; using xmail's built-in logging routines - which open
logfiles as-needed - is a big performance hit.
  - extrememly large messages can take an inordinate amount of time to =
filter this way.  I have a configurable upper-bound, and once
that many lines of data have been seen, filter comparisons are no longer =
performed.  this is a hack, but so far anyway, if a message
hasn't violated anything in the first, say, 10K lines, it probably isn't =
going to.

The "hook" for all of this is in SMTPHandleCmd_DATA(), which:
  - creates the content filter object, passing it a cached copy of the =
filter expression list.  I also pass in the IP of the sender
so that local senders can be exempted
  - performs its normal data input loop, passing each line to the =
content filter object as it is received.  if the filter object
indicates that the message is going to be rejected, incoming data are no =
longer written to the message file but the loop continues
until the end of data is received.  (this can mitigate attempts to fill =
up server disk space)
  - retrieves the final result from the object (permitted/denied) after =
all data is received
  - destroys the content filter object

This was all a fair amount of work, but I use this instead of the =
built-in filter capability because I don't like handling
potentially large messages over and over again, and I don't like =
launching an external program to do the filtering.

---
[EMAIL PROTECTED]


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] =
On Behalf Of Harald Schneider
Sent: Wednesday 27 August 2003 8:38 AM
To: [EMAIL PROTECTED]
Subject: [xmail] AW: Re: How to reject message in SMTP transaction



I also would appreciate a scripting hook on SMTP level. This would open =
=3D up alot of exciting possibilities. E.g. limiting the
number of connections by sending 'Too busy, try =3D later', eventually =
message size checking etc.

--Harald


> -----Urspr=3DFCngliche Nachricht-----
> Von: [EMAIL PROTECTED] =20
>[mailto:[EMAIL PROTECTED] Im Auftrag von Michal=3D20  =
Altair=20
>Valasek
> Gesendet: Mittwoch, 27. August 2003 15:10
> An: [EMAIL PROTECTED]
> Betreff: [xmail] Re: How to reject message in SMTP transaction =3D20
>=3D20
>=3D20
> |Yes would be great function, if you think of Sobig and that=3D3D20=20
> |'virus detected' messages.
>=3D20
> Yes, I am. Being spammed by them, I am writing new version of my FLAVS =
=20
>filter. I cannot silently discard messages (because I am not=3D20  sure =

>if its  virus or not, I am checking only content-type and file names),=20
>but I =3D
=3D3D
> want to
> cause as less problems as possible.
>=3D20
> | And i think the SMTP protcol allows it that the=3D3D20
> |server says 'no'
> |to the mail after the data command and the '.' - am i right?
>=3D20
> Yes, you are - this is the place where SMTP server should=3D20  handle =

>things =3D3D  like
> "message too big" or "mailbox full". The main problem is that=3D20
> you must =3D3D
> keep
> open TCP connection, while filter(s) are executing, which may - for
> long-time running filters and systems under heavy load -=3D20
> cause problems. =3D3D
> But
> I think that it should be a option.
>=3D20
> Now we can see, that antivirus and such systems can cause the same, =
=3D
=3D3D
> maybe
> even bigger problems as the virus itself: it's easy to handle=3D20 =20
>viruses, =3D3D  not
> thousands of technicaly legitimate NDR's they cause. So I am=3D20
> trying to =3D3D
> write
> and configure everything in my jurisdiction to behave well=3D20
> and not cause
> unnecessary problems and load.
>=3D20
> -
> To unsubscribe from this list: send the line "unsubscribe xmail" in
> the body of a message to [EMAIL PROTECTED]
> For general help: send the line "help" in the body of a message to
> [EMAIL PROTECTED]
>=3D20

-
To unsubscribe from this list: send the line "unsubscribe xmail" in the =
body of a message to [EMAIL PROTECTED] For general
help: send the line "help" in the body of a message to =
[EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe xmail" in
the body of a message to [EMAIL PROTECTED]
For general help: send the line "help" in the body of a message to
[EMAIL PROTECTED]

Reply via email to