[tesseract-ocr] Training a language not in tesseract but almost similar script/ letters with vietnam language

haruo195k Thu, 28 Mar 2019 11:33:03 -0700

I am trying to train a language currently not present in Tesseract.

Working with python on Ubuntu 16.04 LTS, tesseract version 3.04.01 ( 
installed with sudo apt install tesseract-ocr , and is working perfectly 
for english language)

I have tested with the following command :

tesseract procssed_image.png stdout -l vie

The output is 90% correct except for some characters that are not in the
vietnam language.

Then,
I have created the *bazaar* file
(/usr/share/tesseract-ocr/tessdata/configs/):

*load_system_dawg Fload_freq_dawg Fuser_words_suffix
user-words*

created a text file with my custom list of words (around 150 words, one
word in each line) and named it as* vie.user-words*

And then ran the following command:

tesseract procssed_image.png stdout -l vie bazaar

The result was same.

Then when I tried with :

tesseract procssed_image.png stdout -l vie bazaar -c
tessedit_char_whitelist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789àâêî

tessedit_char_whitelist <- Here, I am trying to put all the list of
characters that is present in my language and other symbols present in the
image file.

It shows the following errors and also prints the output ( result is same
as before )

*read_params_file: Can't open cread_params_file: Can't open
tessedit_char_whitelist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789àâêî*

Please tell me how to fix this issue? Thank you for your time.

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/55c9df9a-762f-43c3-9538-ba7d0c55dd20%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Training a language not in tesseract but almost similar script/ letters with vietnam language

Reply via email to