Seems like in Sept., after your crunch and Vincenzo returns from vacation, that the two of you should merge your changes (your changes sound parameterizable), and maybe get it into CVS.
If you want to send me a JAR and instructions, I've already got reposting from mbox working, although I am finding some real-world issues, e.g., roughly half of the messages in the target set have To: [EMAIL PROTECTED] instead of To: <[EMAIL PROTECTED]> which is rejected by the MailAddress class. None of them appear to be user messages, but all of them seem to be bounce notices from places like CompuServe and apps like CC Mail Server. I'm curious to know from John Webb or Steven Short if they are seeing similar problems with their mailing list managers. It is possible that the To: field is broken, but the RCPT TO was properly formatted. RFC 822 had some bad examples, although RFC 821 was clear from the start, and RFC 2822 is clear. We might want to account for this in our Fetch services. --- Noel -----Original Message----- From: Danny Angus [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 26, 2003 11:23 To: Noel J. Bergman > I can build the current James v2 SAR. Are you using anything other than > what Vincenzo has on his site, or should I just use the download from his > site? I've got a different take on Chris Means' submission, I've not tried Vincenzos but in theory it should be much the same. Mine is optimised to keep the corpus size low by ignoring tokens < 4 chars and > 15 and ignoring tokens with probabilities in the range 4-6 (neutral) d. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]