It is more interesting than I thought (but have no time for further testing :-( ):
tesseract twowishes.png stdout get.image[1] => "empty page" as ocr output tesseract tessinput.tif stdout => there is ocr output tessinput.tif is output of internal thresholded image[2]. So why there is no ocr output from first command??? Somebody with time should check/debug this ;-) Suggestion for the moment: binarize input image by yourself before ocr-ing it. [1] https://code.google.com/p/tesseract-ocr/source/browse/trunk/tessdata/configs/get.image?spec=svn867&r=826 [2] https://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h?r=856#354 Zdenko On Thu, Sep 19, 2013 at 6:24 PM, Remon Georgy <[email protected]>wrote: > Hi there, > > I'm using tesseract 3.02.02 command-line to extract text from the > following image, but I'm getting "Empty page!!". > > <https://lh3.googleusercontent.com/-FVkDMSqQbrE/Ujsjhz7ULRI/AAAAAAAACO4/aRgT2M12VR0/s1600/twowishes.png> > > > > > > > > > > > > tesseract twowishes.png output && cat output.txt > Tesseract Open Source OCR Engine v3.02.02 with Leptonica > Empty page!! > Empty page!! > > However when I convert the image to grayscale via Imagemagick I do get > some results. > I know it's recommended to convert the image to grayscale before > processing it with tesseract. But my concern is that images resulting from > color conversion routines could yield significantly different results, > based on conversion routine (e.g. -type Grayscale vs. -set colorspace > Gray -separate -average) . Also, I'm that tesseract converts the image to > grayscale internally as a preprocessing step. > Why does tesseract find that image empty? > Thanks. > > Regards, > Remon > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

