Hello Peer-to-Peer, Thursday, July 31, 2008, 10:05:15 PM, you wrote:
> Would it be correct to say the higher we can increase the size-trigger > 'megabytes' value, the better filtering results (accuracy) we will achieve? > In other words, would it be beneficial for us to purchase more memory on our > server (say an additional 2GB), then increase the 'megabytes' value to 400 > or 800? > Several of our servers are hitting the upper limit (159,383,552) 150 MB I don't think so. A quick look at your telemetry indicates that your systems are typically rebooted once per day. This is actually preempting your daily condensation. One result of this is that many of your GBUdb nodes only condense when they reach their size limit. From what I can see, when this happens a significant portion of your GBUdb data is dropped. For example, several of the systems I looked at have not condensed in months. Here is some data from one of them: <timers> <run started="20080801081753" elapsed="19637"/> <sync latest="20080801134415" elapsed="55"/> <save latest="20080801131823" elapsed="1607"/> <condense latest="20080406160144" elapsed="10100606"/> </timers> <gbudb> <size bytes="50331648"/> <records count="214313"/> <utilization percent="91.1357"/> </gbudb> This one has not condensed since 200804 most likely due to restarts that prevented the daily condensation timer from expiring. If this is the case with your other systems as well, it is likely that they are occasionally condensing when they reach their size threshold, but if they were allowed to condense daily they would never reach that limit. In that case, adding additional memory for GBUdb would probably not improve performance significantly. The default settings are conservative even for very large message loads. for example our spamtrap processing systems typically handle 3000-4000 msg/minute continuously and typically have timer & GBUdb telemetry like this: <timers> <run started="20080717205939" elapsed="1270156"/> <sync latest="20080801134844" elapsed="11"/> <save latest="20080801134721" elapsed="94"/> <condense latest="20080801132958" elapsed="1137"/> </timers> <gbudb> <size bytes="117440512"/> <records count="568867"/> <utilization percent="99.6626"/> </gbudb> Note that this SNF node has not been restarted since 20080717 and that it's last condensation was in the early hours today-- most likely due to it's daily timer. Note also that it's GBUdb size is only 117 MBytes. It is unlikely that this system will reach 150Mbytes before the day is finished. Since most systems we see are handling traffic rates significantly smaller than 4.75M/day it is safe to assume that most systems would also be unlikely to reach their default GBUdb size limit during any single day... So, the default of 150 MBytes is likely more than sufficient for most production systems. --- All that said, if you want to intentionally run larger GBUdb data sets on your systems there is no harm in that. Your system will be more aware of habitual bot IPs etc at the expense of memory. Since all GBUdb nodes receive reflections on IP encounters within one minute, it is likely that the benefit would be the ability to reject the first message from a bad IP more frequently... Subsequent messages from bad IPs would likely be rejected by all GBUdb nodes based on reflected data. It is likely that increasing the amount of RAM you assign to your GBUdb nodes will have diminishing returns past the defaults currently set... but it might be fun to try it and see :-) --- If you are looking for better capture rates you may be able to achieve those more readily by adjusting your GBUdb envelopes. The default envelopes are set to avoid false positives on large filtering systems with a diverse client base. It is likely that more restricted systems could afford to use more aggressive envelopes without creating false positives because their traffic would be more specific to their systems. In a hypothetical case: If your system generally never receives legitimate messages from Russian or Chinese ISPs, then it is likely that your system would begin to learn very negative statistics for IPs belonging to those ISPs. A slight adjustment to your black-range GBUdb envelope might be just enough to capture those IPs without creating false positives for other ISPs where you do receive legitimate messages. In any case, since the default ranges are extremely conservative and tuned for large scale filtering systems it is worth experimenting with them to boost your capture rates on nodes that have a more restricted client base. If you have a larger system and you use a clustering deployment methodology then you might still take advantage of these statistics by grouping similar clients on the same node(s) based on where they get their messages. Even if you don't adjust your envelopes this clustering will have the effect of "increasing the signal to noise ratio" for GBUdb as it learns which IPs to trust and which ones to suspect. Hope this helps, _M -- Pete McNeil Chief Scientist, Arm Research Labs, LLC. ############################################################# This message is sent to you because you are subscribed to the mailing list <sniffer@sortmonster.com>. To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]>