[sniffer] Re: GBUdb question

Pete McNeil Tue, 22 Jan 2008 09:33:18 -0800

Hello Rob,

Tuesday, January 22, 2008, 11:09:10 AM, you wrote:


> Pete,

> I have some questions about GBUdb

This may help:

http://kb.armresearch.com/index.php?title=Message_Sniffer.TechnicalDetails.GBUdb

> FIRST QUESTION:

> I have several clients who forward over e-mails from ISP accounts. I 
> have a system whereby I can pick out the original sending server IP. I
> then add that IP to the message in a special header. (this can vary by
> ISP and situation, but I've programmed my system to appropriately 
> determine which IP is the original sending server IP. Next, I add a 
> special custom header which points out that IP.

We are developing an auto-drill-down feature for GBUdb to assist in
automatically training GBUdb in this way. The auto drill feature will
add IPs of intermediate systems to the local ignore list based on
header directives. The theory is that GBUdb will be able to
automatically learn to ignore the intermediate nodes of mixed-source
ISPs in order to identify the original source of the message.

There is still some development work to do on this experimental
feature but we hope to include it in the upcoming release. Any
insights you can provide on reliably identifying these intermediate
servers would be very useful.

The current plan is to locate a specific "tell tale" string in the
Received header that is likely to be the source (based on current
knowledge). If the string is found then that header is disqualified
(and it's IP added to the ignore list) so that the next header becomes
the source candidate.

The "tell tale" string is presumed to be the domain portion (or
similar fragment) of the reverse DNS data in the Received header. So,
for example, if the top Received header contains ".troublesome.isp.com
[" then that header would be disqualified as the source of the message
(for GBUdb purposes), it's IP would be added to the ignore
(infrastructure) list, and the next Received header would be
considered. Once all of the ".troublesome.isp.com [" or similar
headers are exhausted then the next header is likely to be the actual
source (so the theory goes).

> Would it be possible for MessageSniffer to grab the IP from a particular
> header (perhaps this header could be added as a node in the XML config
> file?). That way,  if/when that header is available in the message, 
> Sniffer would then treat *that* IP as the sender's IP?

I will consider adding this to the feature request list. It probably
won't be added to the first version though -- we have a request freeze
in effect to ensure we get the production version out in Q1.

This is also a highly specialized request -- there aren't a lot of
systems out there that can accurately drill through delivery chains to
identify the original source of the message with any great accuracy --
so the number of folks who could use this feature would be pretty
small (if not one). Your use of the command line utility (described
below) seems more appropriate since in effect you want to eliminate
GBUdb's source detection features.

That said - I am anxious to support your work -

Please share an example of the header you would inject.

If it is possible to implement the feature quickly and reliably then I
will see what I can do to add it to the header directives engine.

> SECOND QUESTION:

> Is it possible to tell Sniffer to NOT allow the possibility of 
> "truncating" on a message-by-message basis, where this would be 
> determined if a special command line switch were present. In fact, can
> Sniffer be further instructed to ONLY run "pattern matching" scanning 
> and ignore the GBUdb for that particular message?

It is not possible to turn off truncate on a message by message basis.

It is possible to turn off truncate for all messages but not on a
message by message basis.

You can also create a header directive to cause GBUdb training to
ignore a message with a specific header (or specifically, if it finds
a specific string in a specific header).

> THIRD QUESTION:

> Much of the spam I block doesn't run through Sniffer. Additionally, many
> of the messages that Sniffer blocks are spams sent via established ISPs
> whereas I already have those IPs in an extensive whitelist that I've 
> built up over the years.

> A 4% sampling of this whitelist can be found here:
> http://invaluement.com/fourpercentofwhitelist.txt (multiple the size
> of that by 25 to get an idea of the massive size of my IP whitelist)

> Here is what I'd like to do which I believe would make my contribution
> to sniffer most effective:

> (A) Have sniffer NOT automatically input data into GBUdb with each 
> sniffer scan. (Is that possible?)

You could create "header directives" to selectively disable GBUdb
training.

You can also disable GBUdb training for all messages.

<training on-off='off'>

> (B) Alternatively, whenever my spam filter marks a message as "spam", it
> will issue the following command (but ONLY if that IP is NOT on my IP 
> whitelist, and regardless of whether or not the message was run through
> sniffer):

> SNFClient.exe -bad <IP4Address>

> (If on my IP whitelist, it just won't do anything here.)

Ok.

> (C) If my spam filter marks a message as "ham", then it will issue the
> following command (again, regardless of whether or not the message was
> run through sniffer)

> SNFClient.exe -good <IP4Address>

That sounds very much like what these tools were designed for. However
the effect may not be what you intend.

If the IPs you track are not detected as the source IP by GBUdb then
it is likely to ignore the data during it's scans. It will evaluate
the statistics of the IP it believes to be the source. When it gets
that right it will find your data. When it gets that wrong it will
find no data (most likely) so GBUdb will be effectively inert in those
cases.

If your intent is simply to input this data into the GBUdb system so
that it is available as a resource then that will work - somewhat.

One other thought that I have is that you could use the command line
(or the ignore list) to mark the IPs on your internal white-list as
Infrastructure (ignore flag). This might effectively train GBUdb to
skip those IPs when finding the source of the message - and in any
case would render GBUdb inert for those IPs.


> **********************************
> **********************************
> I know that this puts more trust on me and my system, but I have also 
> know that the quality of stats you'd receive from my system would vastly
> improved due to my abilities in this area and this would be a huge 
> contribution to other Sniffer users over the norm. (I run one of the 
> best RBLs and URI blacklists in the world... I know what I'm doing here!)

Each SNF node contributes to the overall consensus of the "cloud" and
can learn from that consensus. The mathematics that govern these
interactions ensure that malicious or erroneous inputs remain isolated
from the other SNF/GBUdb nodes. And, of course, if we discover any
serious problems we can always turn off any malicious users.

The influence that any single node can exert on GBUdb is limited by a
mathematical curve that separates each node from the cloud. This
ensures that each node retains it's own perspective and that the
consensus is representative of information that GBUdb nodes
consistently agree is correct.

Erroneous or malicious signals have a tendency to be washed out of the
system quickly. Specifically, the strength of any specific input as
seen by any other specific node is reduced by log2(log2(input)). Any
input sufficiently large to cause a disruption would need to be
astronomically large and so it would be detected as highly unusual and
would immediately attract our attention.

Conversely, small inputs made by large numbers of nodes consistently
over time tend to produce large signals that are useful to individual
nodes - especially when a new IP source is detected. The first time a
node sees a new IP it is more likely to be influenced by the opinion
of other nodes that have already seen the IP. Once a node has a
sufficient number of it's own experiences it tends to trust it's
instincts.

> Can these things be done?

Some can be done now. Some may be done in the future.

I look forward to seeing an example of your header.

Hope this helps,

Thanks,

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#############################################################
This message is sent to you because you are subscribed to
  the mailing list <[email protected]>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to  <[EMAIL PROTECTED]>

[sniffer] Re: GBUdb question

Reply via email to