I was wrong - misled by my own failure to run a controlled experiment. These things run so well and smoothly that one tends to forget about them - they merge with the background. But I run SpamBayes in combination with the Mail.app spam filter under OS X. So as a control, I turned the Mail.app filter off for a while. The results of the experiment are that SB catches some; Mail.app catches others; I see very few. Except for those that are trapped by the sort of things that are pointed to in the SB header that I posted, the mainly-image spams are being caught by Mail.app, not by SpamBayes. I won't presume to comment on whether there is anything in Mail.app that could inform how SB deals with images.
Now I will return to my blissful spam-free existence by turning the Mail.app filter on again. Ahhhhh ... that's better. --- Ken Gordon On 2005 Oct 26, at 21:55, <[EMAIL PROTECTED]> wrote: > Ken, > Please post the entire Spambayes Clues listing, so I can see what > Spambayes > is doing with all the erroneous ham text that's included at the bottom > of > your e-mail message example. > > Thanks, > FMJ > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On > Behalf Of Ken Gordon > Sent: Wednesday, October 26, 2005 8:26 AM > To: [email protected]; <[EMAIL PROTECTED]> > Subject: Re: [Spambayes] Inspecting images (was: SpamBayes > toHandleEmbeddedImages) > > There's a lot more to spambayes than just evaluating content. Here's > the SB > Evidence header from a recent spam. But for 'charset', very little of > this > has to do with the content, yet it was correctly classified as spam. > >> X-Spambayes-Evidence: '*H*': 0.00; '*S*': 1.00; > 'received:192.168.1': >> 0.10; 'subject:skip:B 10': 0.16; 'received:192.168': 0.20; >> 'received:192': 0.21; 'url:www': 0.23; 'content-type:image/jpeg': >> 0.34; 'to:addr:none': 0.38; 'header:Return-Path:1': 0.38; >> 'header:MIME-Version:1': 0.61; 'url:': 0.64; 'x-mailer:none': 0.71; >> 'to:no real name:2**0': 0.72; 'from:name:[EMAIL PROTECTED](b': 0.84; >> 'message-id:@imx100522.ath.cx': 0.84; 'received:imx100522.ath.cx': >> 0.84; 'url:fetish': 0.84; 'received:192.168.1.11': 0.91; >> 'received:kick': 0.91; 'content-type:multipart/related': 0.92; >> 'received:210.153': 0.93; 'received:ath.cx': 0.93; 'url:cc': 0.93; >> 'virus:src="cid:': 0.95; 'content-type/type:multipart/alternative': >> 0.96; 'received:cx': 0.97; 'email addr:yahoo.co.jp': 0.99; 'skip:\x1b >> 80': 0.99; 'from:addr:yahoo.co.jp': 1.00; 'from:charset:iso-2022-jp': >> 1.00; 'skip:\x1b 60': 1.00; 'skip:\x1b 30': 1.00; 'skip:\x1b 20': >> 1.00; 'skip:\x1b 50': 1.00; 'subject:$': 1.00; 'received:210': 1.00; >> 'charset:iso-2022-jp': 1.00; 'subject:\x1b$': 1.00; >> 'subjectcharset:iso-2022-jp': 1.00 > > > > On 2005 Oct 25, at 8:37, <[EMAIL PROTECTED]> wrote: > >> How? Technically speaking, what could your SpamBayes installation be >> doing differently? These are ALL ham words, so how is it that your >> e-mail could be classifying all of this as Spam? If it is, I suspect >> you're losing a lot of legitimate e-mail with it. >> >> FMJ >> >> -----Original Message----- >> From: Ken Gordon [mailto:[EMAIL PROTECTED] >> Sent: Monday, October 24, 2005 8:58 PM >> To: [EMAIL PROTECTED] >> Subject: Re: [Spambayes] Inspecting images (was: SpamBayes to >> HandleEmbeddedImages) >> >> My installation of SpamBayes catches nearly all of these. I don't see >> one a month outside of the Spam folder. >> >> --- >> Ken Gordon >> (780) 628-2758 >> http://www.wolfe-gordon.ca >> On 2005 Oct 24, at 20:18, <[EMAIL PROTECTED]> wrote: >> >>> Hi Tony, >>> The problem is, they keep changing the meaningless text at the bottom >>> of the e-mail all the time, to confuse the Spam filter. They're >>> picking Hammy words. And, as you can see, it's a highly effective >>> technique. In other words, NONE of the "Tokens" should actually be >>> "Significant", it's the image that needs to be scored in this case. >>> Here's the spambayes clues for one of the e-mails: >>> >>> Combined Score: 3% (0.0330173) >>> Internal ham score (*H*): 0.999976 >>> Internal spam score (*S*): 0.0660102 >>> >>> # ham trained on: 14237 >>> # spam trained on: 20138 >>> >>> 150 Significant Tokens >>> token spamprob #ham #spam >>> 'sender:no real name:2**0' 0.0277535 2187 88 >>> 'dismissed' 0.0374933 314 17 >>> 'raising' 0.0417704 313 19 >>> 'lives' 0.0580962 1012 88 >>> 'ill' 0.0613924 1084 100 >>> 'said' 0.0677803 6498 668 >>> 'two' 0.08226 5200 659 >>> 'put' 0.0828439 2632 336 >>> 'were' 0.0845653 6094 796 >>> 'recalled' 0.0862187 92 12 >>> 'town' 0.0883783 600 82 >>> 'being' 0.0894639 4312 599 >>> 'letter' 0.093344 1595 232 >>> 'unless' 0.0960663 687 103 >>> 'stephan' 0.0968154 15 2 >>> 'face' 0.0986506 1397 216 >>> 'who' 0.0991493 8031 1250 >>> 'knows' 0.102049 574 92 >>> 'anyone' 0.104976 1828 303 >>> 'them' 0.106325 4690 789 >>> 'think' 0.107446 3584 610 >>> 'keep' 0.109385 2517 437 >>> 'him' 0.111552 2631 467 >>> 'suspicions' 0.113796 40 7 >>> 'went' 0.11401 1331 242 >>> 'sound' 0.116592 596 111 >>> 'care' 0.117491 1244 234 >>> 'going' 0.119623 3503 673 >>> 'sort' 0.119677 511 98 >>> 'his' 0.119861 5717 1101 >>> 'remained' 0.11998 271 52 >>> 'heavily' 0.123551 232 46 >>> 'last' 0.126157 5241 1070 >>> 'subject:: ' 0.134951 9110 2010 >>> 'voice' 0.135891 644 143 >>> 'walk' 0.140296 339 78 >>> 'everyone' 0.140502 1225 283 >>> 'whatever' 0.141645 618 144 >>> 'overdosed' 0.142155 48 11 >>> 'mother' 0.144908 510 122 >>> 'way' 0.146154 3458 837 >>> 'was' 0.146612 8939 2172 >>> 'would' 0.146893 7679 1870 >>> 'but' 0.14865 8435 2083 >>> 'past' 0.155513 1932 503 >>> 'duty' 0.15756 326 86 >>> 'been' 0.158577 6937 1849 >>> 'away' 0.159247 1632 437 >>> 'soon' 0.16154 1021 278 >>> 'header:In-Reply-To:1' 0.162139 1791 490 >>> 'made' 0.163602 3467 959 >>> 'true' 0.164161 566 157 >>> 'too' 0.164462 2199 612 >>> 'then' 0.167186 3519 999 >>> 'road' 0.169212 459 132 >>> 'covington' 0.170591 18 5 >>> 'firmly' 0.171729 69 20 >>> 'received' 0.172468 1646 485 >>> 'yes' 0.17276 275 81 >>> 'other' 0.174723 6686 2002 >>> 'offered' 0.177462 702 214 >>> 'saw' 0.178119 738 226 >>> 'might' 0.184601 2399 768 >>> 'hotel' 0.185114 203 65 >>> 'thought' 0.186457 1287 417 >>> 'her' 0.187192 2831 922 >>> 'indeed' 0.18721 191 62 >>> 'lie' 0.188538 165 54 >>> 'filled' 0.188682 329 108 >>> 'assorted' 0.198662 32 11 >>> 'intent' 0.199592 596 210 >>> 'manner' 0.200765 192 68 >>> 'second' 0.203991 1311 475 >>> 'let' 0.207891 1835 681 >>> 'much' 0.210328 3345 1260 >>> 'back' 0.211425 3207 1216 >>> 'place' 0.214507 1704 658 >>> 'out' 0.216398 6503 2540 >>> 'little' 0.218176 2273 897 >>> 'within' 0.218497 1940 767 >>> 'occupied' 0.218989 56 22 >>> 'never' 0.222876 2224 902 >>> 'take' 0.223351 4101 1668 >>> 'subject:-' 0.223886 2564 1046 >>> 'find' 0.224822 2482 1018 >>> 'play' 0.230279 518 219 >>> 'skip:n 10' 0.233772 2561 1105 >>> 'eyes' 0.234231 294 127 >>> 'that' 0.245614 11155 5137 >>> 'thoughts' 0.250399 193 91 >>> 'observed' 0.252899 109 52 >>> 'not' 0.253605 9451 4542 >>> 'have' 0.260054 10350 5145 >>> 'myself' 0.268888 281 146 >>> 'with' 0.272839 10712 5685 >>> 'skip:r 10' 0.274264 4752 2540 >>> 'look' 0.276317 1963 1060 >>> 'can' 0.286752 7254 4125 >>> 'guided' 0.29442 24 14 >>> 'all' 0.300499 8283 5033 >>> 'resign' 0.304561 39 24 >>> 'contracts' 0.313223 163 105 >>> 'subject:Alert' 0.322897 61 41 >>> 'upon' 0.326586 853 585 >>> 'skip:i 10' 0.332672 4717 3326 >>> 'for' 0.339583 12494 9087 >>> 'topics' 0.371008 114 95 >>> 'the' 0.371613 13338 11157 >>> 'above' 0.380529 678 589 >>> 'header:Return-Path:1' 0.635635 6219 15346 >>> 'consults' 0.695316 3 10 >>> 'comparative' 0.728703 17 65 >>> 'earnest' 0.747547 24 101 >>> 'friendship' 0.796906 6 34 >>> 'blush' 0.797234 13 73 >>> 'skip:7 70' 0.805302 5 30 >>> 'expedition' 0.825248 9 61 >>> 'from:addr:g.wcvbss' 0.844828 0 1 >>> 'from:addr:netnitco.net' 0.844828 0 1 >>> 'from:name:raymond goins' 0.844828 0 1 >>> 'lensalizarin' 0.844828 0 1 >>> "m'scorset" 0.844828 0 1 >>> 'message-id:@icsp.net' 0.844828 0 1 >>> 'ownthat' 0.844828 0 1 >>> 'prominents' 0.844828 0 1 >>> 'roadsthat' 0.844828 0 1 >>> 'sender:addr:athenet.net' 0.844828 0 1 >>> 'sender:addr:h.nnq' 0.844828 0 1 >>> 'subject:< ' 0.844828 0 1 >>> 'subject:Stiles' 0.844828 0 1 >>> 'totrue' 0.844828 0 1 >>> 'virus:src="cid:' 0.888282 111 1250 >>> 'congenial' 0.905802 5 70 >>> 'taters' 0.907976 1 16 >>> 'skip:7 90' 0.908163 0 2 >>> 'header:Received:2' 0.914966 886 13487 >>> 'diem' 0.92631 3 56 >>> 'subject:CBXC' 0.949438 0 4 >>> 'rotund' 0.952904 1 33 >>> 'blushingly' 0.958716 0 5 >>> 'refolding' 0.969799 0 7 >>> 'egress' 0.970088 1 53 >>> 'to:name:freemj' 0.988432 0 19 >>> 'septennial' 0.990405 0 23 >>> 'veal' 0.993066 0 32 >>> 'youll' 0.993469 0 34 >>> 'subject:Stock' 0.99571 0 52 >>> 'casteth' 0.995868 0 54 >>> 'cutlet' 0.996894 0 72 >>> 'to:addr:hotpop.com' 0.997792 23 14803 >>> >>> Message Stream >>> Return-Path: <[EMAIL PROTECTED]> >>> Received: from 38.113.3.52 (unknown [200.107.173.172]) >>> by mx1.hotpop.com (Postfix) with SMTP >>> id 5B8A0E8304; Sun, 23 Oct 2005 23:49:29 +0000 (UTC) >>> Received: from spellbound.gape.jeffersonian.gauguin.es >>> ([200.107.173.172] >>> helo=scatterbrain.mail.elknet.net) by smtp9.bt.com with esmtp >>> id 0X162p-8865LL-80; Mon, 24 Oct 2005 01:48:41 +0100 >>> Message-Id: <[EMAIL PROTECTED]> >>> Sender: [EMAIL PROTECTED] >>> Date: Sun, 23 Oct 2005 20:42:41 -0400 >>> In-Reply-To: Your message of "Sun, 23 Oct 2005 20:46:41 -0400." >>> <[EMAIL PROTECTED]> >>> From: "Raymond Goins" <[EMAIL PROTECTED]> >>> To: "Freemj" <[EMAIL PROTECTED]> >>> Subject: Fwd: Stock - Alert-CBXC< Neil Stiles >>> MIME-Version: 1.0 >>> Content-Type: multipart/related; >>> boundary="--ZZR8PVzcRDTpf2Pu68MQiz" >>> X-HotPOP-Delivered-To: [EMAIL PROTECTED] >>> >>> >>> negligiblestymie breakwatergrist m'scorset >>> >>> >>> >>> We went to the triumph comparative at egress diem then a mouldy sort >>> of establishment have my place so I blushingly offered to resign it >>> The septennial who made as much of my going away as if I were going >>> to China received me as an was dismissed and other topics occupied us >>> he remained so seldom raising >>> his eyes unless to >>> true Rosanne was suspicions arose within me that it was an ill >>> assorted friendship that he never thought of being observed by anyone >>> but was so intent upon her and upon his ownthat I received soon >>> recalled me to myself and put me in the road back to the hotel I was >>> so filled with the play and with the past for it was in a manner >>> Everyone who knows you consults with you and is guided by you Stephan >>> but on second thoughts I shall keep him to take care of me >>> and refolding the letter it would be insupportable to me to think of >>> I am in earnest at last so youll soon have to arrange our contracts >>> and to bind us firmly to them been overdosed with taters I commanded >>> him in my deepest voice to order a veal cutlet and potatoes Yes I am >>> on an expedition of duty My mother lives a little way out of town and >>> the roadsthat I received soon recalled me to myself and put me in the >>> road back to the hotel for I saw a faint blush in her face you would >>> have let me find it out for myself that would not lie too heavily >>> upon her purse and to do my duty in it whatever it might be and the >>> prominents walk and the congenial sound of the rotund casteth >>> hovering above them all >>> 7iVHrKDJTsgBJsJa4Nezv5RgkNpN5NYq6gowYZF0z3De6QLplaiyWM4rm4wSXsXeg7Mik >>> U >>> R >>> q >>> reWfg7M6dwtJ4t1Fxn >>> as he can look at me out of his two eyes Is he indeed said Mr >>> Covington >>> >>> <HTML><HEAD> >>> <META http-equiv=Content-Type content="text/html; >>> charset=windows-1252"> <TITLE>lensalizarin impregnatecost</TITLE> >>> </HEAD> <BODY> <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0"> >>> <TR><TD><font></font><font></font> >>> <BR><STRONG></STRONG><IMG >>> SRC="cid:[email protected]" border="0" >>> ALT="negligiblestymie breakwatergrist m'scorset"> >>> <BR><STRONG></STRONG><font></font><FONT face="Verdana" >>> size=1><FONT></FONT></font></TD></TR><TR><TD><FONT >>> size=1><BR><BR><font></font><STRONG></STRONG>We went to the triumph >>> comparative at egress diem then a mouldy sort of >>> establishment<BR>have my place so I blushingly offered to resign it >>> <STRONG></STRONG><STRONG></STRONG>The septennial who made as much of >>> my going away as if I were going to China received me as an<BR>was >>> dismissed and other topics occupied us he remained so seldom raising >>> his eyes unless to</FONT></TD></TR><TR><TD><FONT size=1>true Rosanne >>> was suspicions arose within me that it was an ill assorted >>> friendship <BR>that he never thought of being observed by anyone but >>> was so intent upon her and upon his own<FONT >>> SIZE=2></FONT><font></font>that I received soon recalled me to >>> myself and put me in the road back to the hotel<BR>I was so filled >>> with the >>> play and with the past for it was in a manner<BR>Everyone who >>> knows you >>> consults with you and is guided by you Stephan <BR>but on second >>> thoughts I shall keep him to take care of me >>> </FONT></TD></TR><TR><TD><FONT size=1>and refolding the letter it >>> would be >>> insupportable to me to think of <BR>I am in earnest at last so >>> youll soon >>> have to arrange our contracts and to bind us firmly to >>> them<font></font><BR>been overdosed with taters I commanded him in >>> my deepest voice to order a veal cutlet and potatoes<BR>Yes I am on >>> an expedition of duty My mother lives a little way out of town and >>> the roads<font></font><FONT SIZE=2></FONT>that I received soon >>> recalled me to myself and put me in the road back to the >>> hotel<BR>for I saw a faint blush in her face you would have let me >>> find it out for myself <font></font>that would not lie too heavily >>> upon her purse and to do my duty in it whatever it might be <BR>and >>> the prominents walk and the congenial sound of the rotund casteth >>> hovering above them all >>> <BR>7iVHrKDJTsgBJsJa4Nezv5RgkNpN5NYq6gowYZF0z3De6QLplaiyWM4rm4wSXsXeg >>> 7 >>> M >>> ikURq >>> reWfg7M6dwtJ4t1Fxn<BR>as he can look at me out of his two eyes Is he >>> indeed said Mr Covington </FONT></TD></TR></TABLE> </BODY> </HTML> >>> >>> All Message Tokens >>> 187 unique tokens >>> >>> 'above' >>> 'all' >>> 'and' >>> 'anyone' >>> 'arose' >>> 'arrange' >>> 'assorted' >>> 'away' >>> 'back' >>> 'been' >>> 'being' >>> 'bind' >>> 'blush' >>> 'blushingly' >>> 'but' >>> 'can' >>> 'care' >>> 'casteth' >>> 'cc:none' >>> 'china' >>> 'commanded' >>> 'comparative' >>> 'congenial' >>> 'consults' >>> 'content-type:text/plain' >>> 'contracts' >>> 'covington' >>> 'cutlet' >>> 'deepest' >>> 'diem' >>> 'dismissed' >>> 'duty' >>> 'earnest' >>> 'egress' >>> 'everyone' >>> 'expedition' >>> 'eyes' >>> 'face' >>> 'faint' >>> 'filled' >>> 'find' >>> 'firmly' >>> 'for' >>> 'friendship' >>> 'from:addr:g.wcvbss' >>> 'from:addr:netnitco.net' >>> 'from:name:raymond goins' >>> 'going' >>> 'guided' >>> 'have' >>> 'header:Date:1' >>> 'header:From:1' >>> 'header:In-Reply-To:1' >>> 'header:MIME-Version:1' >>> 'header:Message-Id:1' >>> 'header:Received:2' >>> 'header:Return-Path:1' >>> 'header:Subject:1' >>> 'header:To:1' >>> 'heavily' >>> 'her' >>> 'him' >>> 'his' >>> 'hotel' >>> 'hovering' >>> 'ill' >>> 'indeed' >>> 'intent' >>> 'keep' >>> 'knows' >>> 'last' >>> 'lensalizarin' >>> 'let' >>> 'letter' >>> 'lie' >>> 'little' >>> 'lives' >>> 'look' >>> "m'scorset" >>> 'made' >>> 'manner' >>> 'message-id:@icsp.net' >>> 'might' >>> 'mother' >>> 'mouldy' >>> 'much' >>> 'myself' >>> 'never' >>> 'not' >>> 'observed' >>> 'occupied' >>> 'offered' >>> 'order' >>> 'other' >>> 'our' >>> 'out' >>> 'overdosed' >>> 'ownthat' >>> 'past' >>> 'place' >>> 'play' >>> 'potatoes' >>> 'prominents' >>> 'purse' >>> 'put' >>> 'raising' >>> 'recalled' >>> 'received' >>> 'refolding' >>> 'remained' >>> 'reply-to:none' >>> 'resign' >>> 'road' >>> 'roadsthat' >>> 'rosanne' >>> 'rotund' >>> 'said' >>> 'saw' >>> 'second' >>> 'seldom' >>> 'sender:addr:athenet.net' >>> 'sender:addr:h.nnq' >>> 'sender:no real name:2**0' >>> 'septennial' >>> 'shall' >>> 'skip:7 70' >>> 'skip:7 90' >>> 'skip:b 10' >>> 'skip:e 10' >>> 'skip:i 10' >>> 'skip:n 10' >>> 'skip:r 10' >>> 'soon' >>> 'sort' >>> 'sound' >>> 'stephan' >>> 'subject: ' >>> 'subject: - ' >>> 'subject:-' >>> 'subject:: ' >>> 'subject:< ' >>> 'subject:Alert' >>> 'subject:CBXC' >>> 'subject:Fwd' >>> 'subject:Neil' >>> 'subject:Stiles' >>> 'subject:Stock' >>> 'suspicions' >>> 'take' >>> 'taters' >>> 'that' >>> 'the' >>> 'them' >>> 'then' >>> 'think' >>> 'thought' >>> 'thoughts' >>> 'to:2**0' >>> 'to:addr:freemj' >>> 'to:addr:hotpop.com' >>> 'to:name:freemj' >>> 'too' >>> 'topics' >>> 'totrue' >>> 'town' >>> 'triumph' >>> 'true' >>> 'two' >>> 'unless' >>> 'upon' >>> 'veal' >>> 'virus:src="cid:' >>> 'voice' >>> 'walk' >>> 'was' >>> 'way' >>> 'went' >>> 'were' >>> 'whatever' >>> 'who' >>> 'with' >>> 'within' >>> 'would' >>> 'x-mailer:none' >>> 'yes' >>> 'you' >>> 'youll' >>> >>> -----Original Message----- >>> From: [EMAIL PROTECTED] >>> [mailto:[EMAIL PROTECTED] On Behalf Of Tony Meyer >>> Sent: Sunday, October 23, 2005 9:43 PM >>> To: <[EMAIL PROTECTED]> >>> Cc: [email protected] >>> Subject: Re: [Spambayes] Inspecting images (was: SpamBayes to >>> HandleEmbeddedImages) >>> >>>> Something really needs to be done about this embedded image Spam. >>>> Honestly, >>>> SpamBayes appears to be ineffective against all these images, >>> >>> Can you post an example of a message that is incorrectly classified, >>> *with the spambayes clues* for the message? The Outlook plug-in >>> provides this via the "Show Clues for this Message" item in the >>> SpamBayes menu. >>> >>> [...] >>>> I'm sure OCR isn't the only way, but the words are there in plain >>>> view. It seems like the obvious way to resolve this. >>> >>> Obvious isn't always best. One of the tenets here is "stupid beats >>> smart" - I think doing some sort of OCR on images would fall into the >>> "smart" category, and generating simple tokens from the images would >>> fall into the "stupid" category and be more successful. Just my >>> opinion, of course, but that's what I'd test if I had time (perhaps >>> over the (southern hemisphere) summer...or maybe I can convince one >>> of my employers that this would be worth doing in paid time). >>> >>>> SpamBayes has been such a great program for me and my colleges, >>>> family and friends. I can only hope that the project sees fit to >>>> resolve this soon. >>> >>> It's not really a case of "seeing fit" - the issue is that the >>> developers are very short on time at the moment (contributions have >>> always been, and always will be, welcome) and, in addition, this is a >>> complex problem. >>> >>> =Tony.Meyer >>> >>> -- >>> Please always include the list (spambayes at python.org) in your >>> replies (reply-all), and please don't send me personal mail about >>> SpamBayes. >>> http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains >>> this. >>> >>> >>> _______________________________________________ >>> [email protected] >>> http://mail.python.org/mailman/listinfo/spambayes >>> Check the FAQ before asking: http://spambayes.sf.net/faq.html >>> >>> >>> _______________________________________________ >>> [email protected] >>> http://mail.python.org/mailman/listinfo/spambayes >>> Check the FAQ before asking: http://spambayes.sf.net/faq.html >>> >> >> > > _______________________________________________ > [email protected] > http://mail.python.org/mailman/listinfo/spambayes > Check the FAQ before asking: http://spambayes.sf.net/faq.html > > _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
