On Fri, May 20, 2011 at 4:44 PM, Holm Dressler <[email protected]>wrote:
> Hi there, > > I want to create tessdata files on a given tiff on my Linux system. My > tiff is called k05.tif > > I used the description on > > http://aravindavk.in/view/tesseract_ocr_initial_setup > > .... which means I do the following step by step: > > > 1. tesseract k05.tif k05 batch.nochop makebox > 2. I clean up the box file with jTessBoxEditor.jar (still have > problems with special characters like the German ö,ä,ü ...) > you can try [1] or other box editors [2] (jTessBoxEditor will be included there in next wiki update). Zdenko [1] https://github.com/zdenop/qt-box-editor [2] http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Box_File_Editors > 3. tesseract k05.tif k05 nobatch box.train > 4. unicharset_extractor k05.box > 5. cp unicharset k05.unicharset > 6. echo k05 0 0 0 0 0 > font_properties > 7. mftraining -F font_properties -U unicharset k05.tr > 8. mftraining -F font_properties -U unicharset -O k05.unicharset > k05.tr > 9. cntraining k05.tr > 10. mv Microfeat k05.Microfeat > 11. mv normproto k05.normproto > 12. mv pffmtable k05.pffmtable > 13. mv mfunicharset k05.mfunicharset > 14. mv inttemp k05.inttemp > 15. wordlist2dawg frequent_words_list k05.freq-dawg k05.unicharset > > Everything works, but combining all the files with > > combine_tessdata k05 > > results in > > Error opening unicharset file > > > The file unicharset exists in my directory (in /home/test/training) I > also renamed the file to k05.unicharset. THE FILE IS NOT EMPTY. > > Somebody knows what I am doing wrong? > > Thanks for any advice, > > Holm > > > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

