On 2012-11-27 17:12, Darin Cox wrote:
Hi Pete,
Would you mind sharing your calculations of confidence and probability?
Here is a page on the math:
http://www.armresearch.com/support/articles/technology/GBUdb/learns.jsp
I'm looking at the stats for p=1.0 and curious about the low
confidence values. I would have expected high confidence where there
were no good samples and a lot of bad... or do I have something
backwards?
Confidence is a measure of the number of samples seen. So, if you have
only one sample, and it was a bad message, then you have a 100%
probability that you will get a bad scan (spam) as far as you know...
BUT since you only have one sample, you don't know very much so your
confidence is low. If, on the other hand, you have seen a few dozen
messages from an IP and all of them were bad then you would have much
more confidence in your probability figure.
Also, while it's easy to parse, it might be nice if the output had one
delimiter between fields instead of being both tab and comma
delimited. Makes importing into a database for analysis much easier.
I will look into making a different output mode that's easier to parse.
The existing one is supposed to be human friendly.
Thanks!
_M
--
Pete McNeil
Chief Scientist
ARM Research Labs, LLC
www.armresearch.com
866-770-1044 x7010
twitter/codedweller
#############################################################
This message is sent to you because you are subscribed to
the mailing list <sniffer@sortmonster.com>.
This list is for discussing Message Sniffer,
Anti-spam, Anti-Malware, and related email topics.
For More information see http://www.armresearch.com
To unsubscribe, E-mail to: <sniffer-...@sortmonster.com>
To switch to the DIGEST mode, E-mail to <sniffer-dig...@sortmonster.com>
To switch to the INDEX mode, E-mail to <sniffer-in...@sortmonster.com>
Send administrative queries to <sniffer-requ...@sortmonster.com>