Thank you for the response. I tried by keeping the bazaar at the end and 
the command runs without any error. However, tesseract is still not able to 
recognize the extra letters that I have provided in the 
*tessedit_char_whitelist, 
*the output is same. The words/ text is in the image is already there in 
the *vie.user-words* file. 
1. Is there any wrong in the way I created that file? 
2. How should I approach this issue. Do I need to provide any other extra 
files?
3. Or I need to re-train it separately for the language from scratch?

Thanks.

On Friday, March 29, 2019 at 10:20:19 AM UTC+5:30, shree wrote:
>
> tesseract procssed_image.png stdout -l vie bazaar -c 
> tessedit_char_whitelist=abcdefghijklmnopqrstuvwxyzABCD
> EFGHIJKLMNOPQRSTUVWXYZ0123456789àâêî
>
> Bazaar should be listed last - see tesseract --help
>
> Check your command syntax
>
> On Fri, 29 Mar 2019, 00:02 , <[email protected] <javascript:>> wrote:
>
>> I am trying to train a language currently not present in Tesseract.
>>
>> Working with python on Ubuntu 16.04 LTS, tesseract version 3.04.01 ( 
>> installed with sudo apt install tesseract-ocr , and is working perfectly 
>> for english language)
>>
>> I have tested with the following command :
>>
>> tesseract procssed_image.png stdout -l vie
>>
>> The output is 90% correct except for some characters that are not in the 
>> vietnam language.
>>
>> Then, 
>> I have created the *bazaar* file 
>> (/usr/share/tesseract-ocr/tessdata/configs/):
>>
>>
>>
>> *load_system_dawg     Fload_freq_dawg          Fuser_words_suffix      
>> user-words*
>>
>> created a text file with my custom list of words (around 150 words, one 
>> word in each line) and named it as* vie.user-words*
>>
>> And then ran the following command:
>>
>> tesseract procssed_image.png stdout -l vie bazaar
>>
>> The result was same.
>>
>> Then when I tried with :
>>
>> tesseract procssed_image.png stdout -l vie bazaar -c 
>> tessedit_char_whitelist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789àâêî
>>
>> tessedit_char_whitelist <- Here, I am trying to put all the list of 
>> characters that is present in my language and other symbols present in the 
>> image file.
>>
>> It shows the following errors and also prints the output ( result is same 
>> as before )
>>
>>
>> *read_params_file: Can't open cread_params_file: Can't open 
>> tessedit_char_whitelist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789àâêî*
>>
>> Please tell me how to fix this issue? Thank you for your time.
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/55c9df9a-762f-43c3-9538-ba7d0c55dd20%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/55c9df9a-762f-43c3-9538-ba7d0c55dd20%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/377503b8-7a6d-4cdc-82d7-964a9b955824%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to