Re: ExtractText and docx

2021-05-07 Thread Benny Pedersen
On 2021-05-07 06:58, Henrik K wrote: Which is why I'm debating if the whole plugin is useful at all or just feeding Bayes crap. oh dear :=) bayes can only be fooled by provide poison data in autolearn, if it manuel trained as spam, then poison data loose maybe there is another problem, YMM

Re: ExtractText and docx

2021-05-06 Thread Henrik K
On Thu, May 06, 2021 at 09:20:28PM -0400, Alex wrote: > > Also, has anyone written any meta rules for use with ExtractText that > they'd like to share? I'd like to block all PDF file that contain any > type of javascript - malicious or otherwise. I'd also like to block > all PDFs that's a single p

Re: ExtractText and docx

2021-05-06 Thread Olivier
Peter West writes: > [1:text/plain Show] > > > [2:text/html Hide Save:noname (29kB)] > > If you have a JVM lying around, you can extract docx text with Apache Tika. I use LibreOffice for that purpose. Not the most efficient, but I am sure it covers it all and will update each time I update Libre

Re: ExtractText and docx

2021-05-06 Thread Peter West
If you have a JVM lying around, you can extract docx text with Apache Tika. — Peter West p...@ehealth.id.au “I am the vine; you are the branches.” > On 7 May 2021, at 2:30 pm, John Hardin wrote: > > On Thu, 6 May 2021, Alex wrote: > >> Hi, >> >> I'm trying to use the latest ExtractText plugi

Re: ExtractText and docx

2021-05-06 Thread John Hardin
On Thu, 6 May 2021, Alex wrote: Hi, I'm trying to use the latest ExtractText plugin, but the docx2txt program the plugin references is no longer available from http://docx2txt.sourceforge.net Do you have any recommendations for an alternative...? Perhaps one of (from Stack Overflow): unz

Re: ExtractText and docx

2021-05-06 Thread Loren Wilton
I'm trying to use the latest ExtractText plugin, but the docx2txt program the plugin references is no longer available from http://docx2txt.sourceforge.net The latest version appears to be 1.4 from several years ago. I just tried downloading the 1.4 version and the CVS version, and in both case

ExtractText and docx

2021-05-06 Thread Alex
Hi, I'm trying to use the latest ExtractText plugin, but the docx2txt program the plugin references is no longer available from http://docx2txt.sourceforge.net I've located a working replacement at https://github.com/ankushshah89/python-docx2txt/ (although it's written in python and I don't have