On Mon, May 03, 2004 at 01:01:08AM -0500, Michael Parker wrote:
> On Sun, May 02, 2004 at 09:33:48PM -0700, Daniel Quinlan wrote:
> > 
> > 1. We should probably not truncate tokens (at least not so much) since
> >    we're hashing now.  Some amount of truncation may still be helpful,
> >    though, so a 10fcv would be a good idea.
> 
> >    Um, I don't recall anyone posting a 10fcv for the hashing.  Someone
> >    did do that, right?
> > 
> 
> Yeah, here ya go:
> 
> Pre Hashing
> 
> SCORE  NUMHIT   DETAIL     OVERALL HISTOGRAM  (. = ham, # = spam)
> 0.000 (96.272%) 
> ..........|.......................................................
> 0.000 ( 0.250%) ###       |
> 0.040 ( 0.194%) ..        |
> 0.040 ( 0.033%)           |
> 0.080 ( 0.122%) .         |
> 0.080 ( 0.039%)           |
> 0.120 ( 0.139%) ..        |
> 0.120 ( 0.039%)           |
> 0.160 ( 0.128%) .         |
> 0.160 ( 0.022%)           |
> 0.200 ( 0.122%) .         |
> 0.200 ( 0.006%)           |
> 0.240 ( 0.078%) .         |
> 0.240 ( 0.017%)           |
> 0.280 ( 0.067%) .         |
> 0.280 ( 0.050%) #         |
> 0.320 ( 0.072%) .         |
> 0.320 ( 0.011%)           |
> 0.360 ( 0.083%) .         |
> 0.360 ( 0.017%)           |
> 0.400 ( 0.106%) .         |
> 0.400 ( 0.033%)           |
> 0.440 ( 0.278%) ...       |
> 0.440 ( 0.117%) #         |
> 0.480 ( 2.222%) ..........|.
> 0.480 ( 5.506%) ##########|####
> 0.520 ( 0.039%)           |
> 0.520 ( 0.956%) ##########|#
> 0.560 ( 0.022%)           |
> 0.560 ( 0.611%) ########  |
> 0.600 ( 0.006%)           |
> 0.600 ( 0.544%) #######   |
> 0.640 ( 0.017%)           |
> 0.640 ( 0.517%) #######   |
> 0.680 ( 0.006%)           |
> 0.680 ( 0.433%) ######    |
> 0.720 ( 0.578%) #######   |
> 0.760 ( 0.006%)           |
> 0.760 ( 0.533%) #######   |
> 0.800 ( 0.006%)           |
> 0.800 ( 0.594%) ########  |
> 0.840 ( 0.011%)           |
> 0.840 ( 0.594%) ########  |
> 0.880 ( 0.006%)           |
> 0.880 ( 0.922%) ##########|#
> 0.920 ( 1.467%) ##########|#
> 0.960 (86.111%) 
> ##########|#######################################################
> 
> 
> Post Hashing
> 
> SCORE  NUMHIT   DETAIL     OVERALL HISTOGRAM  (. = ham, # = spam)
> 0.000 (96.200%) 
> ..........|.......................................................
> 0.000 ( 0.211%) ###       |
> 0.040 ( 0.122%) .         |
> 0.040 ( 0.028%)           |
> 0.080 ( 0.094%) .         |
> 0.080 ( 0.017%)           |
> 0.120 ( 0.094%) .         |
> 0.120 ( 0.044%) #         |
> 0.160 ( 0.139%) ..        |
> 0.160 ( 0.028%)           |
> 0.200 ( 0.117%) .         |
> 0.200 ( 0.011%)           |
> 0.240 ( 0.111%) .         |
> 0.240 ( 0.028%)           |
> 0.280 ( 0.078%) .         |
> 0.280 ( 0.011%)           |
> 0.320 ( 0.139%) ..        |
> 0.320 ( 0.022%)           |
> 0.360 ( 0.100%) .         |
> 0.360 ( 0.022%)           |
> 0.400 ( 0.122%) .         |
> 0.400 ( 0.022%)           |
> 0.440 ( 0.183%) ..        |
> 0.440 ( 0.094%) #         |
> 0.480 ( 2.317%) ..........|.
> 0.480 ( 4.778%) ##########|###
> 0.520 ( 0.072%) .         |
> 0.520 ( 0.756%) ######### |
> 0.560 ( 0.033%)           |
> 0.560 ( 0.444%) #####     |
> 0.600 ( 0.406%) #####     |
> 0.640 ( 0.017%)           |
> 0.640 ( 0.333%) ####      |
> 0.680 ( 0.022%)           |
> 0.680 ( 0.389%) #####     |
> 0.720 ( 0.006%)           |
> 0.720 ( 0.317%) ####      |
> 0.760 ( 0.472%) ######    |
> 0.800 ( 0.472%) ######    |
> 0.840 ( 0.011%)           |
> 0.840 ( 0.444%) #####     |
> 0.880 ( 0.006%)           |
> 0.880 ( 0.711%) ######### |
> 0.920 ( 0.011%)           |
> 0.920 ( 1.044%) ##########|#
> 0.960 ( 0.006%)           |
> 0.960 (88.894%) 
> ##########|#######################################################
> 
> 

Here it is limiting tokens to 128 chars:

SCORE  NUMHIT   DETAIL     OVERALL HISTOGRAM  (. = ham, # = spam)
0.000 (96.200%) 
..........|.......................................................
0.000 ( 0.211%) ###       |
0.040 ( 0.122%) .         |
0.040 ( 0.028%)           |
0.080 ( 0.094%) .         |
0.080 ( 0.017%)           |
0.120 ( 0.094%) .         |
0.120 ( 0.044%) #         |
0.160 ( 0.139%) ..        |
0.160 ( 0.028%)           |
0.200 ( 0.117%) .         |
0.200 ( 0.011%)           |
0.240 ( 0.111%) .         |
0.240 ( 0.028%)           |
0.280 ( 0.078%) .         |
0.280 ( 0.011%)           |
0.320 ( 0.139%) ..        |
0.320 ( 0.022%)           |
0.360 ( 0.100%) .         |
0.360 ( 0.022%)           |
0.400 ( 0.122%) .         |
0.400 ( 0.022%)           |
0.440 ( 0.183%) ..        |
0.440 ( 0.094%) #         |
0.480 ( 2.317%) ..........|.
0.480 ( 4.778%) ##########|###
0.520 ( 0.072%) .         |
0.520 ( 0.756%) ######### |
0.560 ( 0.033%)           |
0.560 ( 0.444%) #####     |
0.600 ( 0.406%) #####     |
0.640 ( 0.017%)           |
0.640 ( 0.333%) ####      |
0.680 ( 0.022%)           |
0.680 ( 0.389%) #####     |
0.720 ( 0.006%)           |
0.720 ( 0.317%) ####      |
0.760 ( 0.472%) ######    |
0.800 ( 0.472%) ######    |
0.840 ( 0.011%)           |
0.840 ( 0.444%) #####     |
0.880 ( 0.006%)           |
0.880 ( 0.711%) ######### |
0.920 ( 0.011%)           |
0.920 ( 1.044%) ##########|#
0.960 ( 0.006%)           |
0.960 (88.894%) 
##########|#######################################################


Michael

Reply via email to