Hello Peer-to-Peer,

Thursday, July 31, 2008, 10:05:15 PM, you wrote:

> Would it be correct to say the higher we can increase the size-trigger
> 'megabytes' value, the better filtering results (accuracy) we will achieve?
> In other words, would it be beneficial for us to purchase more memory on our
> server (say an additional 2GB), then increase the 'megabytes' value to 400
> or 800?

> Several of our servers are hitting the upper limit (159,383,552) 150 MB

I don't think so. A quick look at your telemetry indicates that your
systems are typically rebooted once per day. This is actually
preempting your daily condensation.

One result of this is that many of your GBUdb nodes only condense when
they reach their size limit. From what I can see, when this happens a
significant portion of your GBUdb data is dropped. For example,
several of the systems I looked at have not condensed in months. Here
is some data from one of them:


<timers>
<run started="20080801081753" elapsed="19637"/>
<sync latest="20080801134415" elapsed="55"/>
<save latest="20080801131823" elapsed="1607"/>
<condense latest="20080406160144" elapsed="10100606"/>
</timers>

<gbudb>
<size bytes="50331648"/>
<records count="214313"/>
<utilization percent="91.1357"/>
</gbudb>

This one has not condensed since 200804 most likely due to restarts
that prevented the daily condensation timer from expiring.

If this is the case with your other systems as well, it is likely that
they are occasionally condensing when they reach their size threshold,
but if they were allowed to condense daily they would never reach that
limit.

In that case, adding additional memory for GBUdb would probably not
improve performance significantly.

The default settings are conservative even for very large message
loads. for example our spamtrap processing systems typically handle
3000-4000 msg/minute continuously and typically have timer & GBUdb
telemetry like this:

<timers>
<run started="20080717205939" elapsed="1270156"/>
<sync latest="20080801134844" elapsed="11"/>
<save latest="20080801134721" elapsed="94"/>
<condense latest="20080801132958" elapsed="1137"/>
</timers>

<gbudb>
<size bytes="117440512"/>
<records count="568867"/>
<utilization percent="99.6626"/>
</gbudb>

Note that this SNF node has not been restarted since 20080717 and that
it's last condensation was in the early hours today-- most likely due
to it's daily timer.

Note also that it's GBUdb size is only 117 MBytes. It is unlikely that
this system will reach 150Mbytes before the day is finished.

Since most systems we see are handling traffic rates significantly
smaller than 4.75M/day it is safe to assume that most systems would
also be unlikely to reach their default GBUdb size limit during any
single day... So, the default of 150 MBytes is likely more than
sufficient for most production systems.

---

All that said, if you want to intentionally run larger GBUdb data sets
on your systems there is no harm in that. Your system will be more
aware of habitual bot IPs etc at the expense of memory. Since all
GBUdb nodes receive reflections on IP encounters within one minute, it
is likely that the benefit would be the ability to reject the first
message from a bad IP more frequently... Subsequent messages from bad
IPs would likely be rejected by all GBUdb nodes based on reflected
data.

It is likely that increasing the amount of RAM you assign to your
GBUdb nodes will have diminishing returns past the defaults currently
set... but it might be fun to try it and see :-)

---

If you are looking for better capture rates you may be able to achieve
those more readily by adjusting your GBUdb envelopes. The default
envelopes are set to avoid false positives on large filtering systems
with a diverse client base.

It is likely that more restricted systems could afford to use more
aggressive envelopes without creating false positives because their
traffic would be more specific to their systems.

In a hypothetical case: If your system generally never receives
legitimate messages from Russian or Chinese ISPs, then it is likely
that your system would begin to learn very negative statistics for IPs
belonging to those ISPs. A slight adjustment to your black-range GBUdb
envelope might be just enough to capture those IPs without creating
false positives for other ISPs where you do receive legitimate
messages.

In any case, since the default ranges are extremely conservative and
tuned for large scale filtering systems it is worth experimenting with
them to boost your capture rates on nodes that have a more restricted
client base.

If you have a larger system and you use a clustering deployment
methodology then you might still take advantage of these statistics by
grouping similar clients on the same node(s) based on where they get
their messages. Even if you don't adjust your envelopes this
clustering will have the effect of "increasing the signal to noise
ratio" for GBUdb as it learns which IPs to trust and which ones to
suspect.

Hope this helps,

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#############################################################
This message is sent to you because you are subscribed to
  the mailing list <sniffer@sortmonster.com>.
To unsubscribe, E-mail to: <[EMAIL PROTECTED]>
To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]>
To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]>
Send administrative queries to  <[EMAIL PROTECTED]>

Reply via email to