Re: [tesseract-ocr] Customize TIFF file with OpenCV filters.

2020-01-29 Thread Thad Guidry
What about Dilate and Erode in OpenCV ? https://docs.opencv.org/2.4/modules/imgproc/doc/filtering.html#dilate I mention my experiments here on the Wiki (which includes a link about Dilation and Erosion algorithms in general used in lots of image processing software):

Re: [tesseract-ocr] Pros and cons of .tiff vs .png

2020-01-28 Thread Thad Guidry
There's a few Wiki pages that cover some of this. You can see the pages that have "png" mentioned by doing a search on Github and then filtering on Wiki (instead of default Code) Here's the filtered result pages from the Wiki that talk about "png".

Re: [tesseract-ocr] Re: Why is there no selectable text in the PDF output file?

2020-01-27 Thread Thad Guidry
I use it all the time on my Windows 10 PC. You can save the PDF created and compare to see if it works better. If so, then might be a configuration issue. Thad https://www.linkedin.com/in/thadguidry/ On Mon, Jan 27, 2020 at 10:24 AM 'Eike Stepper' via tesseract-ocr <

Re: [tesseract-ocr] Re: Why is there no selectable text in the PDF output file?

2020-01-27 Thread Thad Guidry
Have you tried to use gImageReader (it uses Tesseract4) and the hOCR/PDF dropdown option and inspect the output panel ? You can also highlight and select text on the image and then see what rows are affected in the output panel. Thad https://www.linkedin.com/in/thadguidry/ -- You received this

[tesseract-ocr] How to control the recogition scale allowed?

2020-01-24 Thread Thad Guidry
I am using gImageReader to capture old statistical tables from old books. I have noticed that a long row of periods are used often in the image's table rows Cattle ... No .. Horses . No .. etc. What I am seeing is that Tesseract4 is extracting the names just fine...but the