Dear All, 

I’m looking for advice because I am stuck. I’m training Tesseract to do 
optical character recognition of texts in Lushootseed, an Indigenous 
language of Washington State with no living speakers. The language has some 
special characters and many diacritics, and I do not know what the font is 
because the texts are (typewriter?) typed or printed from a long time ago.

I finished editing my box files for Lushootseed, but I got stuck on the 
step in Section 7 of the manual by Isabell Hubert for a previous version, 
which is "extracting the character set" with unicharset_extractor. I enter 
this comand and the information for my model into terminal and it says 
unicharset_extractor "*command not found*" 

Isabell says that this manual may not be entirely applicable to the new 4.0 
version, and the best is to ask this group for advice. 

>From what I see online, it means that I have not installed the character 
extractor capability or the training tools. But it looks like my computer 
has downloaded unicharset_extractor.exe, but this is Windows version. I 
have Mac OSX. I cannot figure out how to install (re)install the training 
tools, if needed. 

Any advice anyone has on this would be much appreciated! I am doing this 
for a paid contract, not my own research, so I really would like to get 
past this roadblock. 

Thanks so much. 


Dr. Josh Holden
Postdoctoral researcher
ALT Lab, University of Alberta 

You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit

Reply via email to