You can do `zcat -f` or `gunzip -c -f` and avoid having to have .gz extension, 
that way you can skip the rename step

Best Regards,
Lucas Rolff

From: Clive Jacques <westriverp...@gmail.com>
Date: Friday, 21 May 2021 at 21.04
To: "users@spamassassin.apache.org" <users@spamassassin.apache.org>
Subject: Re: spamassassin and *compressed* Maildir

That's confirmed.  sa-learn doesn't like compressed files.  I don't know if it 
will dine on compressed files with the correct extension (i.e., .gz).  
Unfortunately, when using compression with Maildir format, Dovecot doesn't seem 
to like to use extensions.  So, I copied the directory to a temporary location, 
decompressed the files and then set sa-learn on them.  Even getting gunzip to 
operate on the files was a pain because it only wants files with the .gz 
extension (so I had to rename all 6,000 of them first - using a utility like 
'rename').  I then did the same thing with about 9,000 hams.

There was much good news.  Learning proceeded about the same pace, but syncing 
the journal to the database was much faster.  Maybe the tokens were smaller?  I 
verified that it seemed to work with --dump magic.

Then, all by itself, Spamassassin's bayes filtering was instantly much better.  
Stuff that was tripping BAYES_00 was suddenly popping BAYES_99.

Now, I just need to update my nightly learning/reporting script.

Still, a very nice result.

On Fri, May 21, 2021 at 11:30 AM Henrik K <h...@hege.li<mailto:h...@hege.li>> 
wrote:
On Fri, May 21, 2021 at 10:54:54AM -0400, Clive Jacques wrote:
> Do spamassassin or sa-learn understand compressed files or compressed Maildir?

I believe sa-learn will automatically decompress if the files have .gz or
.bz2 extension, but yes Maildir files without extension will not work.

Should be easy to detect compressed Maildir files, perhaps file enhancement
request in bugzilla.

Reply via email to