It is more interesting than I thought (but have no time for further testing
:-( ):

tesseract twowishes.png stdout get.image[1] => "empty page" as ocr output
tesseract tessinput.tif stdout  => there is ocr output

tessinput.tif is output of internal thresholded image[2]. So why there is
no ocr output from first command??? Somebody with time should check/debug
this ;-)
Suggestion for the moment: binarize input image by yourself before ocr-ing
it.

[1]
https://code.google.com/p/tesseract-ocr/source/browse/trunk/tessdata/configs/get.image?spec=svn867&r=826

[2]
https://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h?r=856#354


Zdenko


On Thu, Sep 19, 2013 at 6:24 PM, Remon Georgy <[email protected]>wrote:

> Hi there,
>
> I'm using tesseract 3.02.02 command-line to extract text from the
> following image, but I'm getting "Empty page!!".
>
> <https://lh3.googleusercontent.com/-FVkDMSqQbrE/Ujsjhz7ULRI/AAAAAAAACO4/aRgT2M12VR0/s1600/twowishes.png>
>
>
>
>
>
>
>
>
>
>
>
> tesseract twowishes.png output && cat output.txt
> Tesseract Open Source OCR Engine v3.02.02 with Leptonica
> Empty page!!
> Empty page!!
>
> However when I convert the image to grayscale via Imagemagick I do get
> some results.
> I know it's recommended to convert the image to grayscale before
> processing it with tesseract. But my concern is that images resulting from
> color conversion routines could yield significantly different results,
> based on conversion routine (e.g. -type Grayscale vs. -set colorspace
> Gray -separate -average) . Also, I'm that tesseract converts the image to
> grayscale internally as a preprocessing step.
> Why does tesseract find that image empty?
> Thanks.
>
> Regards,
> Remon
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to