Re: [tesseract-ocr] Tesseract OCR Failing to Read Cleaned Numbers. Suggestions Please?

tristan gordon Thu, 30 Apr 2020 07:54:30 -0700

Know the resolution, and headers, where the issue for Tesseract OCR PHP the 
following (should help) for anyone in future looking for a solution:


   1. Create your imagick instance, ie $image -> new Imagick('image.jpg');
   2. Then set the resolution using two lines, 
   first: setImageUnits(imagick::RESOLUTION_PIXELSPERINCH); 
   then setImageResolution(300,300); 
   3. The resolution is then set ready for tesseract to read.

I hope that helps.

On Thursday, 30 April 2020 15:29:07 UTC+1, tristan gordon wrote:
>
> Thank you.
> Now to look at imagick to set the resolution!
>
> On Thursday, 30 April 2020 10:36:56 UTC+1, shree wrote:
>>
>> Looks like the image resolution is not set correctly. You can specify dpi 
>> while processing.
>>
>> ubuntu@tesseract-ocr:~/TEST$ tesseract 82.png -  --dpi 300
>> 82
>> ubuntu@tesseract-ocr:~/TEST$ tesseract 81.png -  --dpi 300
>> 81
>>
>>
>> On Thu, Apr 30, 2020 at 2:57 PM tristan gordon <[email protected]> 
>> wrote:
>>
>>> Hello all,
>>>
>>> Could you help?
>>>
>>> Attached are two images containing two numbers, 81 and 82, which I am 
>>> attempting to get Tesseract OCR to read.
>>>
>>> Each time Tesseract OCR is returning empty page and producing an empty 
>>> text.txt document.
>>>
>>> The error is displaying as follows:
>>>
>>> # tesseract 82.png out
>>> Tesseract Open Source OCR Engine v4.1.1-rc2-20-g01fb with Leptonica
>>> Warning: Invalid resolution 0 dpi. Using 70 instead.
>>> Estimating resolution as 1622
>>> Empty page!!
>>> Estimating resolution as 1622
>>> Empty page!!
>>>
>>> How can I get the numbers to output? Are any changed required to the 
>>> images or to tesseract?
>>>
>>> These images have been produced using Centos 7, Apache, PHP and Imagick. 
>>> Retrieving the image from an external server, then processing the image 
>>> using Imagick to crop, grayscale, trim to focus area, resize, smooth edges, 
>>> remove background, set image to black and white, flatten the image, set a 
>>> resolution and image format.
>>> These images have then been saved (for development purposes) and tested 
>>> using the above. 
>>>
>>> Once these errors are sorted and it's running, tesseract-ocr-php will 
>>> complete the process on the fly (as there's around 6000 images to read).
>>>
>>> Let me know.
>>>
>>> Thank you (in advance).
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/2314316b-1b5c-4a44-b9bb-8e65a901a688%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/2314316b-1b5c-4a44-b9bb-8e65a901a688%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> -- 
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4d5b7781-57ed-4485-858a-15af1caa0b4b%40googlegroups.com.

Re: [tesseract-ocr] Tesseract OCR Failing to Read Cleaned Numbers. Suggestions Please?

Reply via email to