[tesseract-ocr] large char set language training

2017-06-16 Thread Richard Foo
Dear all, I am new to tesseract. When I train a large char set language like Chinese, I have no idea which step I should use the char set(over 7000 char) I prepared. Currently, I consider it as a training set by converting all_char.txt to tiff files. Therefore, I have a image training data of a

Re: [tesseract-ocr] large char set language training

2017-06-16 Thread ShreeDevi Kumar
Yes, there is a method for rendering synthetic training data from training_text and fonts via text2image program and tesstrain.sh script. https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-%E2%80%93-tesstrain.sh Wh