Pete,

Not that I necessarily expect for this to happen, but rather something to consider as things progress...

With the cross-checking of SURBL data, one needs to be careful to not be double scoring tests that target the same piece of data. An example of this would be scoring each DUL hit separately as opposed to scoring DUL hits, one or many, as a single scoreable hit. Unfortunately we don't know exactly what triggers a hit in Sniffer when it happens. In reality, the very high correlation between SURBL hits and Sniffer can be attributed to both the +95% spam hit rates in Sniffer, but also the cross referencing. If I were to be able to tell that Sniffer hit on some other form of content, and SURBL hit on the URL, the combination of hits would be stronger as a group, but not knowing, and understanding that there would be a fair amount of hits for the same piece of data, the combination of hits should be treated as weaker as a group than the sum of scores.

I guess the only way to modify Sniffer to such needs might be to reclassify rules based on the type of data that the rule targets instead of the type of content the rule was generated from. As you are aware, Subject rules are weaker than body domain name hits, and body domain name hits are weaker than full URL hits. Some current result codes are highly suggestive of the types of rules that are contained, such as obfuscation rules, but porn rules are rather wide open. I guess as an administrator, I would prefer to know the classification based on the reliability of the data used as opposed to the genre of spam from which the rule was created (and this is not perfectly consistent in subsequent hits).

I fear that this would mostly benefit power users who construct combination filters from this data and would benefit from classifying such hits, though some benefit could come by way of the simple weighting of Sniffer in isolation from other such things where there are notable differences in the reliability of the data.

This might also be somewhat impractical, and certainly not expected outside of a large change in the way that the app behaved. So again, just something to chew on, and I'm sure it has crossed your mind before.

Thanks,

Matt



Pete McNeil wrote:

On Monday, January 10, 2005, 7:17:29 PM, Andrew wrote:

CA> Pete, I thought that you had said at one point that SortMonster fetches
CA> one or more SURBL zones and incorporates those as spam data for Message
CA> Sniffer?

CA> It seems like a great idea to me.  But then, from my distance, a lot of
CA> things look like a good idea for someone else to implement!

That's not exactly how it works -

What we do is that our robots will look at some of the messages that
hit our spamtraps and if they find a URI that looks like a good choice
they will cross check it with SURBL.

More often than not we've already got the URI coded from our manual
work, but this robotic mechanism allows the rulebase to keep up minute
by minute - and since the email triggering this work has come in
through one of our spamtraps, it acts like an extra check - so those
listings that we do have tend to be very solid.

At some point we may bolt on some additional real-time lookups like
SURBL etc... but we don't have plans for that just yet, and most
installations already have these tools employed in other mechanisms
they are running, so it would be redundant for us to add it - at least
at this point.

Hope this helps,
_M




This E-Mail came from the Message Sniffer mailing list. For information and (un)subscription instructions go to http://www.sortmonster.com/MessageSniffer/Help/Help.html





-- ===================================================== MailPure custom filters for Declude JunkMail Pro. http://www.mailpure.com/software/ =====================================================


This E-Mail came from the Message Sniffer mailing list. For information and (un)subscription instructions go to http://www.sortmonster.com/MessageSniffer/Help/Help.html

Reply via email to