I have a mail folder that I put false negatives in (i.e., spam which ends up in my inbox) and another for false negatives (ham that ends up in my spam folder). Each night I run sa-learn on each folder (sa-learn will munch on entire Maildirs) and also feed each message to spamassassin -r to report it. So using zcat or gunzip -c will work for spamassassin -r, but not for sa-learn.
Unless sa-learn can munch on stdin as well as files.... -CJ On Fri, May 21, 2021 at 3:28 PM Lucas Rolff <lu...@lucasrolff.com> wrote: > You can do `zcat -f` or `gunzip -c -f` and avoid having to have .gz > extension, that way you can skip the rename step > > > > Best Regards, > > Lucas Rolff > > > > *From: *Clive Jacques <westriverp...@gmail.com> > *Date: *Friday, 21 May 2021 at 21.04 > *To: *"users@spamassassin.apache.org" <users@spamassassin.apache.org> > *Subject: *Re: spamassassin and *compressed* Maildir > > > > That's confirmed. sa-learn doesn't like compressed files. I don't know > if it will dine on compressed files with the correct extension (i.e., > .gz). Unfortunately, when using compression with Maildir format, Dovecot > doesn't seem to like to use extensions. So, I copied the directory to a > temporary location, decompressed the files and then set sa-learn on them. > Even getting gunzip to operate on the files was a pain because it only > wants files with the .gz extension (so I had to rename all 6,000 of them > first - using a utility like 'rename'). I then did the same thing with > about 9,000 hams. > > > > There was much good news. Learning proceeded about the same pace, but > syncing the journal to the database was *much *faster. Maybe the tokens > were smaller? I verified that it seemed to work with --dump magic. > > > > Then, all by itself, Spamassassin's bayes filtering was instantly much > better. Stuff that was tripping BAYES_00 was suddenly popping BAYES_99. > > > > Now, I just need to update my nightly learning/reporting script. > > > > Still, a very nice result. > > > > On Fri, May 21, 2021 at 11:30 AM Henrik K <h...@hege.li> wrote: > > On Fri, May 21, 2021 at 10:54:54AM -0400, Clive Jacques wrote: > > Do spamassassin or sa-learn understand compressed files or compressed > Maildir? > > I believe sa-learn will automatically decompress if the files have .gz or > .bz2 extension, but yes Maildir files without extension will not work. > > Should be easy to detect compressed Maildir files, perhaps file enhancement > request in bugzilla. > >