On 16.10.18 18:42, RW wrote:
Bayes might work, but I wouldn't like to see it added to body text
because corrupted text could look like obfuscation.
On Wed, 17 Oct 2018, Matus UHLAR - fantomas wrote:
it should be pushed back to body text just for filters like bayes.
The same could/should be done for attachhed .doc, .pdf files etc.
On 17.10.18 07:56, John Hardin wrote:
...which would be much more reliable than OCR.
If it was a resource-allocation decision for pulling text from doc/pdf
vs. updating OCR, I'd push for the former.
this could be easily configured by installing modules or loading them.
btw, both PDF and word documents can contain images too ...
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
99 percent of lawyers give the rest a bad name.