Hi,
I need to build tesseract-ocr (https://code.google.com/p/tesseract-ocr/) 
from source in order to OCR some PDF files.  Many people use "convert" of 
imagemagick to first convert a PDF to a TIFF then resort to Tesseract to 
OCR the TIFF to a text file.

Since Tesseract depends on the Leptonica Image Processing Library 
(http://leptonica.com/), i had to build that from source as well.  Our OS 
distro is the old RHEL 6.2.  In our computing environment, most  
utilities/tools are not installed at the typical locations (/usr/bin, 
/usr/local, etc.).   According to one of the README files, I don't need the 
JPEG/JPG & PNG headers/libs unless I need to write to a PDF so i did not 
yank them in (from our non-standard locations) while building Leptonica.   

When I fired off Tesseract as in
   /path/to/somewhere/install/tesseract-ocr_3.02.02/bin/tesseract  t.tiff  
output
I got the following error message
   Tesseract Open Source OCR Engine v3.02.03 With Leptonica
   Error in findTiffCompression: function not present
   Error in pixReadStreamTiff: function not present
   Error in pixReadStream: tiff: no pix returned
   Error in pixRead: pix not read
   Unsupported image type

I am puzzled since the 2 missing functions are present in the shared lib 
according to my investigation below ...

>From ldd of the Tesseract ELF binary:
   %  ldd  /path/to/somewhere/install/tesseract-ocr_3.02.02/bin/tesseract 
   ...
   liblept.so.4 => 
/path/to/somewhere/install/Leptonica_1.71/lib/liblept.so.4 
(0x00007fb10b4e6000)

And also the LD_LIBRARY_PATH setting (I know LD_LIBRARY_PATH is to be 
frowned upon but i only used it here as a temporary hack):
   %   echo $LD_LIBRARY_PATH
   
/path/to/somewhere/install/tesseract-ocr_3.02.02/lib:/path/to/somewhere/install/Leptonica_1.71/lib

The 2 functions that appeared in the error output above, namely 
findTiffCompression & pixReadStreamTiff, DO EXIST in the share lib:
   %  nm -D /path/to/somewhere/install/Leptonica_1.71/lib/liblept.so.4  |  
grep findTiffCompression
   00000000001a0140  T  findTiffCompression

   %  nm -D /path/to/somewhere/install/Leptonica_1.71/lib/liblept.so.4  |  
grep pixReadStreamTiff
   00000000001a03e0  T  pixReadStreamTiff

What am I missing here?

Thanks for reading.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/90d8d0e8-ae7f-43af-8485-24826d78c10e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to