On Sun, May 02, 2004 at 09:09:27PM -0400, Theo Van Dinter wrote: > On Sun, May 02, 2004 at 05:39:14PM -0500, Michael Parker wrote: > > I'm contemplating limiting bayes tokens to 128 chars, in the tokenize > > method. Anyone see a problem with that? > > Am I missing something? > > use constant MAX_TOKEN_LENGTH => 15; > > ... although, I don't see a substr() that actually limits it ... :( >
Wierd, I never noticed that section of code before. I wonder why it isn't getting tripped. > > Maybe 128 is too large in a theoretical worst-case attack (of someone > > turning on storage of original tokens). 32 or 64 might be better. > > What's the issue exactly? If we're hashing down to 5 bytes anyway, > who cares what size the input is? The large length tokens aren't a > big deal unless huge mails start going around (who cares if we have a > handful of large tokens?) Well, this turns out to have been an issue with the SQL code because the table was limited to 200 chars. It rarely ever hit that mark and everytime it was garbage data so it was never that big of a deal. Of course that went away with the hashing. Now I'm putting code in to optionally save the original token value, I'd like to limit the token size. Michael
