@Shree Thanks for the tip. Just 2 quick questions. 1) From https://github.com/tesseract-ocr/tesseract/wiki/Data-Files, it says that "osd" and "equ" traineddata files are compatible between Tesseract 3 and 4. In the GitHub tessdata_fast repo (https://github.com/tesseract-ocr/tessdata_fast), "osd" is there with the commit "Use legacy Orientation Script Detector (OSD) because that is the only thing that currently works." However, "equ" is not in the repo. Was this simply a small mistake where the maintainer forgot to include the "equ" data file?
2) Also, with tessdata_fast, I was able to get Tesseract 4 running faster than using Tesseract 4 with tessdata. However, is Tesseract 4 supposed to be slower than Tesseract 3 because that's what I'm experiencing? # Here are the updated instructions to download tessdata_fast, which I tested to indeed perform faster than tessdata. # However, when calling Tesseract from the command line, using the arguments "--oem 2" will no longer work. # Use "--oem 1" since only the neural net LSTM model exists if using tessdata_fast. wget https://github.com/tesseract-ocr/tessdata_fast/blob/master/osd.traineddata?raw=true wget https://github.com/tesseract-ocr/tessdata_fast/blob/master/eng.traineddata?raw=true wget https://github.com/tesseract-ocr/tessdata_fast/blob/master/chi_sim.traineddata?raw=true On Monday, April 23, 2018 at 2:37:09 PM UTC-4, shree wrote: > > Thanks for the script to install tesseract on CentOS. > > I would suggest using traineddata files from tessdata_fast or > tessdata_best repos for better accuracy and speed. > > On Mon 23 Apr, 2018, 11:52 PM Eugene Huang, <eugen...@gmail.com > <javascript:>> wrote: > >> Hello! Most people are probably running Tesseract 4 on Ubuntu, MacOS, and >> Windows. Unfortunately, there are no clear instructions on installing >> Tesseract 4 for other flavors of Linux--probably most notably CentOS and >> Red Hat. >> >> After going through dependency hell, I successfully installed Tesseract 4 >> onto CentOS 7. I presume that the installation script should also work for >> Red Hat. I want to give credit to EisenVault because this script is >> essentially a modified version of his script. This is my first contribution >> to open source software, so any tips will be highly appreciated! >> >> When running this script line by line, you probably have to prefix "sudo" >> to each line, or you can copy and paste into a bash script and then run >> sudo along with the script. I have tested both to work on a fresh image of >> CentOS 7 on VirtualBox. >> >> Cheers! >> >> # (Estimated Time of Completion: 45 minutes) >> # Instructions taken (and slightly modified) from >> https://github.com/EisenVault/install-tesseract-redhat-centos/blob/master/install-tesseract.sh >> cd /opt >> # The following line will take 30 minutes to install. >> yum -y update >> yum -y install libstdc++ autoconf automake libtool autoconf-archive >> pkg-config >> gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel >> yum group install -y "Development Tools" >> >> >> # Install Leptonica from Source >> wget http://www.leptonica.com/source/leptonica-1.75.3.tar.gz >> tar -zxvf leptonica-1.75.3.tar.gz >> cd leptonica-1.75.3 >> ./autobuild >> ./configure >> make -j >> make install >> cd .. >> # Delete tar.gz file if you like >> >> >> # Sanity checks >> # check if libpng is installed: type "whereis libpng" and expect to see a >> directory; a blank line is not good >> # check if leptonica is installed: type "ls /usr/local/include" and >> expect to see "leptonica" >> >> >> # Install Tesseract from Source >> wget https:// >> github.com/tesseract-ocr/tesseract/archive/4.0.0-beta.1.tar.gz >> tar -zxvf 4.0.0-beta.1.tar.gz >> cd tesseract-4.0.0-beta.1/ >> ./autogen.sh >> PKG_CONFIG_PATH=/usr/local/lib/pkgconfig >> LIBLEPT_HEADERSDIR=/usr/local/include >> ./configure --with-extra-includes=/usr/local/include --with-extra- >> libraries=/usr/local/lib >> LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make -j >> make install >> ldconfig >> cd .. >> # Delete tar.gz file if you like >> >> >> # Download and install tesseract language files (Tesseract 4 traineddata >> files) >> wget https://github.com/tesseract-ocr/tessdata/raw/master/osd.traineddata >> wget https://github.com/tesseract-ocr/tessdata/raw/master/equ.traineddata >> wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata >> wget https:// >> github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata >> # download another other languages you like >> mv *.traineddata /usr/local/share/tessdata >> >> >> # Sanity check >> # check if tesseract is installed: type "tesseract --version" and expect >> to see 1st line (tesseract), 2nd line (leptonica), 3rd line(libraries for >> images) >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/d41ebcc5-b3b1-4e66-af8a-c7896814a7cc%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/d41ebcc5-b3b1-4e66-af8a-c7896814a7cc%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0ad1e94c-92a7-47c5-88d2-1391b6172889%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.