@Shree
Thanks for the tip. Just 2 quick questions. 
1) From https://github.com/tesseract-ocr/tesseract/wiki/Data-Files, it says 
that "osd" and "equ" traineddata files are compatible between Tesseract 3 
and 4. In the GitHub tessdata_fast repo 
(https://github.com/tesseract-ocr/tessdata_fast), "osd" is there with the 
commit "Use legacy Orientation Script Detector (OSD) because that is the 
only thing that currently works." However, "equ" is not in the repo. Was 
this simply a small mistake where the maintainer forgot to include the 
"equ" data file?

2) Also, with tessdata_fast, I was able to get Tesseract 4 running faster 
than using Tesseract 4 with tessdata. However, is Tesseract 4 supposed to 
be slower than Tesseract 3 because that's what I'm experiencing?




# Here are the updated instructions to download tessdata_fast, which I 
tested to indeed perform faster than tessdata.
# However, when calling Tesseract from the command line, using the 
arguments "--oem 2" will no longer work. 
# Use "--oem 1" since only the neural net LSTM model exists if using 
tessdata_fast.
wget 
https://github.com/tesseract-ocr/tessdata_fast/blob/master/osd.traineddata?raw=true
wget 
https://github.com/tesseract-ocr/tessdata_fast/blob/master/eng.traineddata?raw=true
wget 
https://github.com/tesseract-ocr/tessdata_fast/blob/master/chi_sim.traineddata?raw=true


On Monday, April 23, 2018 at 2:37:09 PM UTC-4, shree wrote:
>
> Thanks for the script to install tesseract on CentOS.
>
> I would suggest using traineddata files from tessdata_fast or 
> tessdata_best repos for better accuracy and speed.
>
> On Mon 23 Apr, 2018, 11:52 PM Eugene Huang, <eugen...@gmail.com 
> <javascript:>> wrote:
>
>> Hello! Most people are probably running Tesseract 4 on Ubuntu, MacOS, and 
>> Windows. Unfortunately, there are no clear instructions on installing 
>> Tesseract 4 for other flavors of Linux--probably most notably CentOS and 
>> Red Hat.
>>
>> After going through dependency hell, I successfully installed Tesseract 4 
>> onto CentOS 7. I presume that the installation script should also work for 
>> Red Hat. I want to give credit to EisenVault because this script is 
>> essentially a modified version of his script. This is my first contribution 
>> to open source software, so any tips will be highly appreciated!
>>
>> When running this script line by line, you probably have to prefix "sudo" 
>> to each line, or you can copy and paste into a bash script and then run 
>> sudo along with the script. I have tested both to work on a fresh image of 
>> CentOS 7 on VirtualBox.
>>
>> Cheers!
>>
>> # (Estimated Time of Completion: 45 minutes)
>> # Instructions taken (and slightly modified) from 
>> https://github.com/EisenVault/install-tesseract-redhat-centos/blob/master/install-tesseract.sh
>> cd /opt
>> # The following line will take 30 minutes to install.
>> yum -y update 
>> yum -y install libstdc++ autoconf automake libtool autoconf-archive 
>> pkg-config 
>> gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel
>> yum group install -y "Development Tools"
>>
>>
>> # Install Leptonica from Source
>> wget http://www.leptonica.com/source/leptonica-1.75.3.tar.gz
>> tar -zxvf leptonica-1.75.3.tar.gz
>> cd leptonica-1.75.3
>> ./autobuild
>> ./configure
>> make -j
>> make install
>> cd ..
>> # Delete tar.gz file if you like
>>
>>
>> # Sanity checks
>> # check if libpng is installed: type "whereis libpng" and expect to see a 
>> directory; a blank line is not good
>> # check if leptonica is installed: type "ls /usr/local/include" and 
>> expect to see "leptonica"
>>
>>
>> # Install Tesseract from Source
>> wget https://
>> github.com/tesseract-ocr/tesseract/archive/4.0.0-beta.1.tar.gz
>> tar -zxvf 4.0.0-beta.1.tar.gz
>> cd tesseract-4.0.0-beta.1/
>> ./autogen.sh
>> PKG_CONFIG_PATH=/usr/local/lib/pkgconfig 
>> LIBLEPT_HEADERSDIR=/usr/local/include 
>> ./configure --with-extra-includes=/usr/local/include --with-extra-
>> libraries=/usr/local/lib
>> LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make -j
>> make install
>> ldconfig
>> cd ..
>> # Delete tar.gz file if you like
>>
>>
>> # Download and install tesseract language files (Tesseract 4 traineddata 
>> files)
>> wget https://github.com/tesseract-ocr/tessdata/raw/master/osd.traineddata
>> wget https://github.com/tesseract-ocr/tessdata/raw/master/equ.traineddata
>> wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
>> wget https://
>> github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata
>> # download another other languages you like
>> mv *.traineddata /usr/local/share/tessdata
>>
>>
>> # Sanity check
>> # check if tesseract is installed: type "tesseract --version" and expect 
>> to see 1st line (tesseract), 2nd line (leptonica), 3rd line(libraries for 
>> images)
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/d41ebcc5-b3b1-4e66-af8a-c7896814a7cc%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/d41ebcc5-b3b1-4e66-af8a-c7896814a7cc%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0ad1e94c-92a7-47c5-88d2-1391b6172889%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to