Yes, I use this in Windows. I created a batch file to automate the training.
This is the code:

 

 

:: Generate BOX files from the TIF image

tesseract %1.tif %1.box batch.nochop makebox

del %1.box

ren %1.box.txt %1.box

 

:: Train Tesseract OCR engine

tesseract %1.tif junk nobatch box.train

 

:: Perform clustering

mftraining %1.tr

del %1.inttemp

ren inttemp %1.inttemp

del %1.pffmtable

ren pffmtable %1.pffmtable

cntraining %1.tr

del %1.normproto

ren normproto %1.normproto

 

:: Compute the character set

unicharset_extractor %1.box

del %1.unicharset

ren unicharset %1.unicharset

 

:: Prepare dictionary data

echo. > frequent_words_list.txt

wordlist2dawg frequent_words_list.txt %1.freq-dawg

del frequent_words_list.txt

echo. > words_list.txt

wordlist2dawg words_list.txt %1.word-dawg

del words_list.txt

echo. > %1.user-words

echo. > %1.DangAmbigs

 

:: Perform cleanup

del %1.box

del %1.tr

del junk.txt

del Microfeat

del tesseract.log

 

 

If you do not understand the above code, forget about it and follow the
articles from my previous email.

 

Cheers,

 

Svetlin Nakov

Managing Partner

Consulting and Information Technology Agency

http://www.citagency.eu

  _____  

From: tesseract-ocr@googlegroups.com [mailto:tesseract-...@googlegroups.com]
On Behalf Of nicdnepr
Sent: Thursday, September 24, 2009 9:23 PM
To: tesseract-ocr@googlegroups.com
Subject: Re: teaching new letters forms

 

can it be used in windows? i use tessnet2 in net, nothing clear

2009/9/24 Svetlin Nakov <svet...@nakov.com>


Pleas read this document about tesseract training:
http://crblp.bracu.ac.bd/presentation/Integrating%20Bangla2OCRopus.pdf

The instructions in this document about training are better than the
official documentation:
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract (my personal
opinion).

Svetlin Nakov
Managing Partner
Consulting and Information Technology Agency
http://www.citagency.eu

-----Original Message-----
From: tesseract-ocr@googlegroups.com [mailto:tesseract-...@googlegroups.com]
On Behalf Of nicdnepr
Sent: Thursday, September 24, 2009 6:35 PM
To: tesseract-ocr
Subject: teaching new letters forms


is it possible with tessnet2 teach new letters forms?
i use images to OCR like this (always upper case)

for example it always mistake with letters I-T-J
how can i teach tessnet with such letters forms? or maybe make more
deep analyzing letters?








--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to