Yes, I use this in Windows. I created a batch file to automate the training. This is the code:
:: Generate BOX files from the TIF image tesseract %1.tif %1.box batch.nochop makebox del %1.box ren %1.box.txt %1.box :: Train Tesseract OCR engine tesseract %1.tif junk nobatch box.train :: Perform clustering mftraining %1.tr del %1.inttemp ren inttemp %1.inttemp del %1.pffmtable ren pffmtable %1.pffmtable cntraining %1.tr del %1.normproto ren normproto %1.normproto :: Compute the character set unicharset_extractor %1.box del %1.unicharset ren unicharset %1.unicharset :: Prepare dictionary data echo. > frequent_words_list.txt wordlist2dawg frequent_words_list.txt %1.freq-dawg del frequent_words_list.txt echo. > words_list.txt wordlist2dawg words_list.txt %1.word-dawg del words_list.txt echo. > %1.user-words echo. > %1.DangAmbigs :: Perform cleanup del %1.box del %1.tr del junk.txt del Microfeat del tesseract.log If you do not understand the above code, forget about it and follow the articles from my previous email. Cheers, Svetlin Nakov Managing Partner Consulting and Information Technology Agency http://www.citagency.eu _____ From: tesseract-ocr@googlegroups.com [mailto:tesseract-...@googlegroups.com] On Behalf Of nicdnepr Sent: Thursday, September 24, 2009 9:23 PM To: tesseract-ocr@googlegroups.com Subject: Re: teaching new letters forms can it be used in windows? i use tessnet2 in net, nothing clear 2009/9/24 Svetlin Nakov <svet...@nakov.com> Pleas read this document about tesseract training: http://crblp.bracu.ac.bd/presentation/Integrating%20Bangla2OCRopus.pdf The instructions in this document about training are better than the official documentation: http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract (my personal opinion). Svetlin Nakov Managing Partner Consulting and Information Technology Agency http://www.citagency.eu -----Original Message----- From: tesseract-ocr@googlegroups.com [mailto:tesseract-...@googlegroups.com] On Behalf Of nicdnepr Sent: Thursday, September 24, 2009 6:35 PM To: tesseract-ocr Subject: teaching new letters forms is it possible with tessnet2 teach new letters forms? i use images to OCR like this (always upper case) for example it always mistake with letters I-T-J how can i teach tessnet with such letters forms? or maybe make more deep analyzing letters? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---