Re: [tesseract-ocr] Converting colored background and colored characters to text with the Tesseract library

2024-08-04 Thread Zdenko Podobny
Captcha was created to fool OCR. Zdenko po 5. 8. 2024 o 7:27 Emre Batu napísal(a): > [image: 20240804211345.png] Hello everyone. I am using the Tesseract > library in a C# application to analyze images. However, the image I want to > convert to text contains colored characters and a colored

[tesseract-ocr] Converting colored background and colored characters to text with the Tesseract library

2024-08-04 Thread Emre Batu
[image: 20240804211345.png] Hello everyone. I am using the Tesseract library in a C# application to analyze images. However, the image I want to convert to text contains colored characters and a colored background. As a result, the output is not accurate. How can I convert this image to text c

[tesseract-ocr] Re: Tesseract not working for some single examples.

2024-08-04 Thread 'Danny' via tesseract-ocr
If you can, try pre-processing and inverting the image so it is black text on a white background. I found that recognition works much better with the preprocessing (probably since the models were trained with that kind of input) On Tuesday, July 30, 2024 at 10:45:56 PM UTC+8 allelu...@gmail.co

Re: [tesseract-ocr] Re: How to prevern Tesseract from interpreting noise as characters

2024-08-04 Thread Zdenko Podobny
tesseract unnamed.jpg - Estimating resolution as 182 e.g. no recognized word... So the problem could be in the parameters you used for OCR... Before OCR I suggest image preprocessing and maybe the detection of empty pages. Have a look at leptonica example for Normalize for uneven illumination (p

[tesseract-ocr] Re: How to prevern Tesseract from interpreting noise as characters

2024-08-04 Thread Iain Downs
In the event that anyone else has a similar issue, this is how I approached it. Firstly, make a histogram of the number of pixels with each intensity (so an array of 256 numbers). When you inspect this you get results like the below. [image: Finding empty pages.png] This is after a little smo