On Sun, May 02, 2004 at 09:09:27PM -0400, Theo Van Dinter wrote:
> On Sun, May 02, 2004 at 05:39:14PM -0500, Michael Parker wrote:
> > I'm contemplating limiting bayes tokens to 128 chars, in the tokenize
> > method.  Anyone see a problem with that?
> 
> Am I missing something?
> 
> use constant MAX_TOKEN_LENGTH => 15;
> 
> ... although, I don't see a substr() that actually limits it ...  :(
> 

Wierd, I never noticed that section of code before.  I wonder why it
isn't getting tripped.

> > Maybe 128 is too large in a theoretical worst-case attack (of someone
> > turning on storage of original tokens).  32 or 64 might be better.
> 
> What's the issue exactly?  If we're hashing down to 5 bytes anyway,
> who cares what size the input is?  The large length tokens aren't a
> big deal unless huge mails start going around (who cares if we have a
> handful of large tokens?)

Well, this turns out to have been an issue with the SQL code because
the table was limited to 200 chars.  It rarely ever hit that mark and
everytime it was garbage data so it was never that big of a deal.  Of
course that went away with the hashing.  Now I'm putting code in to
optionally save the original token value, I'd like to limit the token
size.

Michael




Reply via email to