Hi, I need to build tesseract-ocr (https://code.google.com/p/tesseract-ocr/) from source in order to OCR some PDF files. Many people use "convert" of imagemagick to first convert a PDF to a TIFF then resort to Tesseract to OCR the TIFF to a text file.
Since Tesseract depends on the Leptonica Image Processing Library (http://leptonica.com/), i had to build that from source as well. Our OS distro is the old RHEL 6.2. In our computing environment, most utilities/tools are not installed at the typical locations (/usr/bin, /usr/local, etc.). According to one of the README files, I don't need the JPEG/JPG & PNG headers/libs unless I need to write to a PDF so i did not yank them in (from our non-standard locations) while building Leptonica. When I fired off Tesseract as in /path/to/somewhere/install/tesseract-ocr_3.02.02/bin/tesseract t.tiff output I got the following error message Tesseract Open Source OCR Engine v3.02.03 With Leptonica Error in findTiffCompression: function not present Error in pixReadStreamTiff: function not present Error in pixReadStream: tiff: no pix returned Error in pixRead: pix not read Unsupported image type I am puzzled since the 2 missing functions are present in the shared lib according to my investigation below ... >From ldd of the Tesseract ELF binary: % ldd /path/to/somewhere/install/tesseract-ocr_3.02.02/bin/tesseract ... liblept.so.4 => /path/to/somewhere/install/Leptonica_1.71/lib/liblept.so.4 (0x00007fb10b4e6000) And also the LD_LIBRARY_PATH setting (I know LD_LIBRARY_PATH is to be frowned upon but i only used it here as a temporary hack): % echo $LD_LIBRARY_PATH /path/to/somewhere/install/tesseract-ocr_3.02.02/lib:/path/to/somewhere/install/Leptonica_1.71/lib The 2 functions that appeared in the error output above, namely findTiffCompression & pixReadStreamTiff, DO EXIST in the share lib: % nm -D /path/to/somewhere/install/Leptonica_1.71/lib/liblept.so.4 | grep findTiffCompression 00000000001a0140 T findTiffCompression % nm -D /path/to/somewhere/install/Leptonica_1.71/lib/liblept.so.4 | grep pixReadStreamTiff 00000000001a03e0 T pixReadStreamTiff What am I missing here? Thanks for reading. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/90d8d0e8-ae7f-43af-8485-24826d78c10e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.