I'm not sure what's the case for Tamil, but in general the imagery for doing training is not available. So basically you would have to start all over.
Paul Am Montag, 14. Juli 2014 10:07:59 UTC+2 schrieb sibi kanagaraj: > > Hi all , > > This is Sibi from Chennai , India . I wanted to improve the Tesseract OCR > engine for recognizing Tamil Fonts . Hence I started with the Ray Smith's > paper on "An Overview of the Tesseract OCR Engine" and contacted him for > further more information and mailed him for that . > > He directed me to see > > > https://drive.google.com/file/d/0B7l10Bj_LprhbUlIUFlCdGtDYkE/edit?usp=sharing > > and also informed me that font recognition is already present for Tamil > language > > > https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02.tam.tar.gz&can=2&q= > > > But , I feel that Tamil Training is not sufficient and it could be > streamlined . Hence I went to see if there are sufficient training > documents for Tamil . This search landed me to this page > <http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>. And > subsequently I found *" Things I would NOT recommend working on" *here > . <http://code.google.com/p/tesseract-ocr/wiki/TesseractProjects> > > I am little bit stuck here . I wanted to do this project as part of my > Masters Degree . Isnt it that Tamil Training is independent module that > could be worked upon ? > > -Sibi > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2fb3bfd1-9dec-445f-804e-eb38165724f7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

