Ken, Please post the entire Spambayes Clues listing, so I can see what Spambayes is doing with all the erroneous ham text that's included at the bottom of your e-mail message example.
Thanks, FMJ -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ken Gordon Sent: Wednesday, October 26, 2005 8:26 AM To: [email protected]; <[EMAIL PROTECTED]> Subject: Re: [Spambayes] Inspecting images (was: SpamBayes toHandleEmbeddedImages) There's a lot more to spambayes than just evaluating content. Here's the SB Evidence header from a recent spam. But for 'charset', very little of this has to do with the content, yet it was correctly classified as spam. > X-Spambayes-Evidence: '*H*': 0.00; '*S*': 1.00; 'received:192.168.1': > 0.10; 'subject:skip:B 10': 0.16; 'received:192.168': 0.20; > 'received:192': 0.21; 'url:www': 0.23; 'content-type:image/jpeg': > 0.34; 'to:addr:none': 0.38; 'header:Return-Path:1': 0.38; > 'header:MIME-Version:1': 0.61; 'url:': 0.64; 'x-mailer:none': 0.71; > 'to:no real name:2**0': 0.72; 'from:name:[EMAIL PROTECTED](b': 0.84; > 'message-id:@imx100522.ath.cx': 0.84; 'received:imx100522.ath.cx': > 0.84; 'url:fetish': 0.84; 'received:192.168.1.11': 0.91; > 'received:kick': 0.91; 'content-type:multipart/related': 0.92; > 'received:210.153': 0.93; 'received:ath.cx': 0.93; 'url:cc': 0.93; > 'virus:src="cid:': 0.95; 'content-type/type:multipart/alternative': > 0.96; 'received:cx': 0.97; 'email addr:yahoo.co.jp': 0.99; 'skip:\x1b > 80': 0.99; 'from:addr:yahoo.co.jp': 1.00; 'from:charset:iso-2022-jp': > 1.00; 'skip:\x1b 60': 1.00; 'skip:\x1b 30': 1.00; 'skip:\x1b 20': > 1.00; 'skip:\x1b 50': 1.00; 'subject:$': 1.00; 'received:210': 1.00; > 'charset:iso-2022-jp': 1.00; 'subject:\x1b$': 1.00; > 'subjectcharset:iso-2022-jp': 1.00 On 2005 Oct 25, at 8:37, <[EMAIL PROTECTED]> wrote: > How? Technically speaking, what could your SpamBayes installation be > doing differently? These are ALL ham words, so how is it that your > e-mail could be classifying all of this as Spam? If it is, I suspect > you're losing a lot of legitimate e-mail with it. > > FMJ > > -----Original Message----- > From: Ken Gordon [mailto:[EMAIL PROTECTED] > Sent: Monday, October 24, 2005 8:58 PM > To: [EMAIL PROTECTED] > Subject: Re: [Spambayes] Inspecting images (was: SpamBayes to > HandleEmbeddedImages) > > My installation of SpamBayes catches nearly all of these. I don't see > one a month outside of the Spam folder. > > --- > Ken Gordon > (780) 628-2758 > http://www.wolfe-gordon.ca > On 2005 Oct 24, at 20:18, <[EMAIL PROTECTED]> wrote: > >> Hi Tony, >> The problem is, they keep changing the meaningless text at the bottom >> of the e-mail all the time, to confuse the Spam filter. They're >> picking Hammy words. And, as you can see, it's a highly effective >> technique. In other words, NONE of the "Tokens" should actually be >> "Significant", it's the image that needs to be scored in this case. >> Here's the spambayes clues for one of the e-mails: >> >> Combined Score: 3% (0.0330173) >> Internal ham score (*H*): 0.999976 >> Internal spam score (*S*): 0.0660102 >> >> # ham trained on: 14237 >> # spam trained on: 20138 >> >> 150 Significant Tokens >> token spamprob #ham #spam >> 'sender:no real name:2**0' 0.0277535 2187 88 >> 'dismissed' 0.0374933 314 17 >> 'raising' 0.0417704 313 19 >> 'lives' 0.0580962 1012 88 >> 'ill' 0.0613924 1084 100 >> 'said' 0.0677803 6498 668 >> 'two' 0.08226 5200 659 >> 'put' 0.0828439 2632 336 >> 'were' 0.0845653 6094 796 >> 'recalled' 0.0862187 92 12 >> 'town' 0.0883783 600 82 >> 'being' 0.0894639 4312 599 >> 'letter' 0.093344 1595 232 >> 'unless' 0.0960663 687 103 >> 'stephan' 0.0968154 15 2 >> 'face' 0.0986506 1397 216 >> 'who' 0.0991493 8031 1250 >> 'knows' 0.102049 574 92 >> 'anyone' 0.104976 1828 303 >> 'them' 0.106325 4690 789 >> 'think' 0.107446 3584 610 >> 'keep' 0.109385 2517 437 >> 'him' 0.111552 2631 467 >> 'suspicions' 0.113796 40 7 >> 'went' 0.11401 1331 242 >> 'sound' 0.116592 596 111 >> 'care' 0.117491 1244 234 >> 'going' 0.119623 3503 673 >> 'sort' 0.119677 511 98 >> 'his' 0.119861 5717 1101 >> 'remained' 0.11998 271 52 >> 'heavily' 0.123551 232 46 >> 'last' 0.126157 5241 1070 >> 'subject:: ' 0.134951 9110 2010 >> 'voice' 0.135891 644 143 >> 'walk' 0.140296 339 78 >> 'everyone' 0.140502 1225 283 >> 'whatever' 0.141645 618 144 >> 'overdosed' 0.142155 48 11 >> 'mother' 0.144908 510 122 >> 'way' 0.146154 3458 837 >> 'was' 0.146612 8939 2172 >> 'would' 0.146893 7679 1870 >> 'but' 0.14865 8435 2083 >> 'past' 0.155513 1932 503 >> 'duty' 0.15756 326 86 >> 'been' 0.158577 6937 1849 >> 'away' 0.159247 1632 437 >> 'soon' 0.16154 1021 278 >> 'header:In-Reply-To:1' 0.162139 1791 490 >> 'made' 0.163602 3467 959 >> 'true' 0.164161 566 157 >> 'too' 0.164462 2199 612 >> 'then' 0.167186 3519 999 >> 'road' 0.169212 459 132 >> 'covington' 0.170591 18 5 >> 'firmly' 0.171729 69 20 >> 'received' 0.172468 1646 485 >> 'yes' 0.17276 275 81 >> 'other' 0.174723 6686 2002 >> 'offered' 0.177462 702 214 >> 'saw' 0.178119 738 226 >> 'might' 0.184601 2399 768 >> 'hotel' 0.185114 203 65 >> 'thought' 0.186457 1287 417 >> 'her' 0.187192 2831 922 >> 'indeed' 0.18721 191 62 >> 'lie' 0.188538 165 54 >> 'filled' 0.188682 329 108 >> 'assorted' 0.198662 32 11 >> 'intent' 0.199592 596 210 >> 'manner' 0.200765 192 68 >> 'second' 0.203991 1311 475 >> 'let' 0.207891 1835 681 >> 'much' 0.210328 3345 1260 >> 'back' 0.211425 3207 1216 >> 'place' 0.214507 1704 658 >> 'out' 0.216398 6503 2540 >> 'little' 0.218176 2273 897 >> 'within' 0.218497 1940 767 >> 'occupied' 0.218989 56 22 >> 'never' 0.222876 2224 902 >> 'take' 0.223351 4101 1668 >> 'subject:-' 0.223886 2564 1046 >> 'find' 0.224822 2482 1018 >> 'play' 0.230279 518 219 >> 'skip:n 10' 0.233772 2561 1105 >> 'eyes' 0.234231 294 127 >> 'that' 0.245614 11155 5137 >> 'thoughts' 0.250399 193 91 >> 'observed' 0.252899 109 52 >> 'not' 0.253605 9451 4542 >> 'have' 0.260054 10350 5145 >> 'myself' 0.268888 281 146 >> 'with' 0.272839 10712 5685 >> 'skip:r 10' 0.274264 4752 2540 >> 'look' 0.276317 1963 1060 >> 'can' 0.286752 7254 4125 >> 'guided' 0.29442 24 14 >> 'all' 0.300499 8283 5033 >> 'resign' 0.304561 39 24 >> 'contracts' 0.313223 163 105 >> 'subject:Alert' 0.322897 61 41 >> 'upon' 0.326586 853 585 >> 'skip:i 10' 0.332672 4717 3326 >> 'for' 0.339583 12494 9087 >> 'topics' 0.371008 114 95 >> 'the' 0.371613 13338 11157 >> 'above' 0.380529 678 589 >> 'header:Return-Path:1' 0.635635 6219 15346 >> 'consults' 0.695316 3 10 >> 'comparative' 0.728703 17 65 >> 'earnest' 0.747547 24 101 >> 'friendship' 0.796906 6 34 >> 'blush' 0.797234 13 73 >> 'skip:7 70' 0.805302 5 30 >> 'expedition' 0.825248 9 61 >> 'from:addr:g.wcvbss' 0.844828 0 1 >> 'from:addr:netnitco.net' 0.844828 0 1 >> 'from:name:raymond goins' 0.844828 0 1 >> 'lensalizarin' 0.844828 0 1 >> "m'scorset" 0.844828 0 1 >> 'message-id:@icsp.net' 0.844828 0 1 >> 'ownthat' 0.844828 0 1 >> 'prominents' 0.844828 0 1 >> 'roadsthat' 0.844828 0 1 >> 'sender:addr:athenet.net' 0.844828 0 1 >> 'sender:addr:h.nnq' 0.844828 0 1 >> 'subject:< ' 0.844828 0 1 >> 'subject:Stiles' 0.844828 0 1 >> 'totrue' 0.844828 0 1 >> 'virus:src="cid:' 0.888282 111 1250 >> 'congenial' 0.905802 5 70 >> 'taters' 0.907976 1 16 >> 'skip:7 90' 0.908163 0 2 >> 'header:Received:2' 0.914966 886 13487 >> 'diem' 0.92631 3 56 >> 'subject:CBXC' 0.949438 0 4 >> 'rotund' 0.952904 1 33 >> 'blushingly' 0.958716 0 5 >> 'refolding' 0.969799 0 7 >> 'egress' 0.970088 1 53 >> 'to:name:freemj' 0.988432 0 19 >> 'septennial' 0.990405 0 23 >> 'veal' 0.993066 0 32 >> 'youll' 0.993469 0 34 >> 'subject:Stock' 0.99571 0 52 >> 'casteth' 0.995868 0 54 >> 'cutlet' 0.996894 0 72 >> 'to:addr:hotpop.com' 0.997792 23 14803 >> >> Message Stream >> Return-Path: <[EMAIL PROTECTED]> >> Received: from 38.113.3.52 (unknown [200.107.173.172]) >> by mx1.hotpop.com (Postfix) with SMTP >> id 5B8A0E8304; Sun, 23 Oct 2005 23:49:29 +0000 (UTC) >> Received: from spellbound.gape.jeffersonian.gauguin.es >> ([200.107.173.172] >> helo=scatterbrain.mail.elknet.net) by smtp9.bt.com with esmtp >> id 0X162p-8865LL-80; Mon, 24 Oct 2005 01:48:41 +0100 >> Message-Id: <[EMAIL PROTECTED]> >> Sender: [EMAIL PROTECTED] >> Date: Sun, 23 Oct 2005 20:42:41 -0400 >> In-Reply-To: Your message of "Sun, 23 Oct 2005 20:46:41 -0400." >> <[EMAIL PROTECTED]> >> From: "Raymond Goins" <[EMAIL PROTECTED]> >> To: "Freemj" <[EMAIL PROTECTED]> >> Subject: Fwd: Stock - Alert-CBXC< Neil Stiles >> MIME-Version: 1.0 >> Content-Type: multipart/related; >> boundary="--ZZR8PVzcRDTpf2Pu68MQiz" >> X-HotPOP-Delivered-To: [EMAIL PROTECTED] >> >> >> negligiblestymie breakwatergrist m'scorset >> >> >> >> We went to the triumph comparative at egress diem then a mouldy sort >> of establishment have my place so I blushingly offered to resign it >> The septennial who made as much of my going away as if I were going >> to China received me as an was dismissed and other topics occupied us >> he remained so seldom raising >> his eyes unless to >> true Rosanne was suspicions arose within me that it was an ill >> assorted friendship that he never thought of being observed by anyone >> but was so intent upon her and upon his ownthat I received soon >> recalled me to myself and put me in the road back to the hotel I was >> so filled with the play and with the past for it was in a manner >> Everyone who knows you consults with you and is guided by you Stephan >> but on second thoughts I shall keep him to take care of me >> and refolding the letter it would be insupportable to me to think of >> I am in earnest at last so youll soon have to arrange our contracts >> and to bind us firmly to them been overdosed with taters I commanded >> him in my deepest voice to order a veal cutlet and potatoes Yes I am >> on an expedition of duty My mother lives a little way out of town and >> the roadsthat I received soon recalled me to myself and put me in the >> road back to the hotel for I saw a faint blush in her face you would >> have let me find it out for myself that would not lie too heavily >> upon her purse and to do my duty in it whatever it might be and the >> prominents walk and the congenial sound of the rotund casteth >> hovering above them all >> 7iVHrKDJTsgBJsJa4Nezv5RgkNpN5NYq6gowYZF0z3De6QLplaiyWM4rm4wSXsXeg7Mik >> U >> R >> q >> reWfg7M6dwtJ4t1Fxn >> as he can look at me out of his two eyes Is he indeed said Mr >> Covington >> >> <HTML><HEAD> >> <META http-equiv=Content-Type content="text/html; >> charset=windows-1252"> <TITLE>lensalizarin impregnatecost</TITLE> >> </HEAD> <BODY> <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0"> >> <TR><TD><font></font><font></font> >> <BR><STRONG></STRONG><IMG >> SRC="cid:[email protected]" border="0" >> ALT="negligiblestymie breakwatergrist m'scorset"> >> <BR><STRONG></STRONG><font></font><FONT face="Verdana" >> size=1><FONT></FONT></font></TD></TR><TR><TD><FONT >> size=1><BR><BR><font></font><STRONG></STRONG>We went to the triumph >> comparative at egress diem then a mouldy sort of >> establishment<BR>have my place so I blushingly offered to resign it >> <STRONG></STRONG><STRONG></STRONG>The septennial who made as much of >> my going away as if I were going to China received me as an<BR>was >> dismissed and other topics occupied us he remained so seldom raising >> his eyes unless to</FONT></TD></TR><TR><TD><FONT size=1>true Rosanne >> was suspicions arose within me that it was an ill assorted >> friendship <BR>that he never thought of being observed by anyone but >> was so intent upon her and upon his own<FONT >> SIZE=2></FONT><font></font>that I received soon recalled me to >> myself and put me in the road back to the hotel<BR>I was so filled >> with the >> play and with the past for it was in a manner<BR>Everyone who >> knows you >> consults with you and is guided by you Stephan <BR>but on second >> thoughts I shall keep him to take care of me >> </FONT></TD></TR><TR><TD><FONT size=1>and refolding the letter it >> would be >> insupportable to me to think of <BR>I am in earnest at last so >> youll soon >> have to arrange our contracts and to bind us firmly to >> them<font></font><BR>been overdosed with taters I commanded him in >> my deepest voice to order a veal cutlet and potatoes<BR>Yes I am on >> an expedition of duty My mother lives a little way out of town and >> the roads<font></font><FONT SIZE=2></FONT>that I received soon >> recalled me to myself and put me in the road back to the >> hotel<BR>for I saw a faint blush in her face you would have let me >> find it out for myself <font></font>that would not lie too heavily >> upon her purse and to do my duty in it whatever it might be <BR>and >> the prominents walk and the congenial sound of the rotund casteth >> hovering above them all >> <BR>7iVHrKDJTsgBJsJa4Nezv5RgkNpN5NYq6gowYZF0z3De6QLplaiyWM4rm4wSXsXeg >> 7 >> M >> ikURq >> reWfg7M6dwtJ4t1Fxn<BR>as he can look at me out of his two eyes Is he >> indeed said Mr Covington </FONT></TD></TR></TABLE> </BODY> </HTML> >> >> All Message Tokens >> 187 unique tokens >> >> 'above' >> 'all' >> 'and' >> 'anyone' >> 'arose' >> 'arrange' >> 'assorted' >> 'away' >> 'back' >> 'been' >> 'being' >> 'bind' >> 'blush' >> 'blushingly' >> 'but' >> 'can' >> 'care' >> 'casteth' >> 'cc:none' >> 'china' >> 'commanded' >> 'comparative' >> 'congenial' >> 'consults' >> 'content-type:text/plain' >> 'contracts' >> 'covington' >> 'cutlet' >> 'deepest' >> 'diem' >> 'dismissed' >> 'duty' >> 'earnest' >> 'egress' >> 'everyone' >> 'expedition' >> 'eyes' >> 'face' >> 'faint' >> 'filled' >> 'find' >> 'firmly' >> 'for' >> 'friendship' >> 'from:addr:g.wcvbss' >> 'from:addr:netnitco.net' >> 'from:name:raymond goins' >> 'going' >> 'guided' >> 'have' >> 'header:Date:1' >> 'header:From:1' >> 'header:In-Reply-To:1' >> 'header:MIME-Version:1' >> 'header:Message-Id:1' >> 'header:Received:2' >> 'header:Return-Path:1' >> 'header:Subject:1' >> 'header:To:1' >> 'heavily' >> 'her' >> 'him' >> 'his' >> 'hotel' >> 'hovering' >> 'ill' >> 'indeed' >> 'intent' >> 'keep' >> 'knows' >> 'last' >> 'lensalizarin' >> 'let' >> 'letter' >> 'lie' >> 'little' >> 'lives' >> 'look' >> "m'scorset" >> 'made' >> 'manner' >> 'message-id:@icsp.net' >> 'might' >> 'mother' >> 'mouldy' >> 'much' >> 'myself' >> 'never' >> 'not' >> 'observed' >> 'occupied' >> 'offered' >> 'order' >> 'other' >> 'our' >> 'out' >> 'overdosed' >> 'ownthat' >> 'past' >> 'place' >> 'play' >> 'potatoes' >> 'prominents' >> 'purse' >> 'put' >> 'raising' >> 'recalled' >> 'received' >> 'refolding' >> 'remained' >> 'reply-to:none' >> 'resign' >> 'road' >> 'roadsthat' >> 'rosanne' >> 'rotund' >> 'said' >> 'saw' >> 'second' >> 'seldom' >> 'sender:addr:athenet.net' >> 'sender:addr:h.nnq' >> 'sender:no real name:2**0' >> 'septennial' >> 'shall' >> 'skip:7 70' >> 'skip:7 90' >> 'skip:b 10' >> 'skip:e 10' >> 'skip:i 10' >> 'skip:n 10' >> 'skip:r 10' >> 'soon' >> 'sort' >> 'sound' >> 'stephan' >> 'subject: ' >> 'subject: - ' >> 'subject:-' >> 'subject:: ' >> 'subject:< ' >> 'subject:Alert' >> 'subject:CBXC' >> 'subject:Fwd' >> 'subject:Neil' >> 'subject:Stiles' >> 'subject:Stock' >> 'suspicions' >> 'take' >> 'taters' >> 'that' >> 'the' >> 'them' >> 'then' >> 'think' >> 'thought' >> 'thoughts' >> 'to:2**0' >> 'to:addr:freemj' >> 'to:addr:hotpop.com' >> 'to:name:freemj' >> 'too' >> 'topics' >> 'totrue' >> 'town' >> 'triumph' >> 'true' >> 'two' >> 'unless' >> 'upon' >> 'veal' >> 'virus:src="cid:' >> 'voice' >> 'walk' >> 'was' >> 'way' >> 'went' >> 'were' >> 'whatever' >> 'who' >> 'with' >> 'within' >> 'would' >> 'x-mailer:none' >> 'yes' >> 'you' >> 'youll' >> >> -----Original Message----- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] On Behalf Of Tony Meyer >> Sent: Sunday, October 23, 2005 9:43 PM >> To: <[EMAIL PROTECTED]> >> Cc: [email protected] >> Subject: Re: [Spambayes] Inspecting images (was: SpamBayes to >> HandleEmbeddedImages) >> >>> Something really needs to be done about this embedded image Spam. >>> Honestly, >>> SpamBayes appears to be ineffective against all these images, >> >> Can you post an example of a message that is incorrectly classified, >> *with the spambayes clues* for the message? The Outlook plug-in >> provides this via the "Show Clues for this Message" item in the >> SpamBayes menu. >> >> [...] >>> I'm sure OCR isn't the only way, but the words are there in plain >>> view. It seems like the obvious way to resolve this. >> >> Obvious isn't always best. One of the tenets here is "stupid beats >> smart" - I think doing some sort of OCR on images would fall into the >> "smart" category, and generating simple tokens from the images would >> fall into the "stupid" category and be more successful. Just my >> opinion, of course, but that's what I'd test if I had time (perhaps >> over the (southern hemisphere) summer...or maybe I can convince one >> of my employers that this would be worth doing in paid time). >> >>> SpamBayes has been such a great program for me and my colleges, >>> family and friends. I can only hope that the project sees fit to >>> resolve this soon. >> >> It's not really a case of "seeing fit" - the issue is that the >> developers are very short on time at the moment (contributions have >> always been, and always will be, welcome) and, in addition, this is a >> complex problem. >> >> =Tony.Meyer >> >> -- >> Please always include the list (spambayes at python.org) in your >> replies (reply-all), and please don't send me personal mail about >> SpamBayes. >> http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. >> >> >> _______________________________________________ >> [email protected] >> http://mail.python.org/mailman/listinfo/spambayes >> Check the FAQ before asking: http://spambayes.sf.net/faq.html >> >> >> _______________________________________________ >> [email protected] >> http://mail.python.org/mailman/listinfo/spambayes >> Check the FAQ before asking: http://spambayes.sf.net/faq.html >> > > _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
