Namaste, The TrainingTesseract section on the home page is quite thorough, but you can search the archives for info.
There are at least two other people (Blavatsky + R. Kantan) trying to train for Tamil. Please search the archives https://groups.google.com/forum/?fromgroups#!searchin/tesseract-ocr/tamil --Sven On Tue, Sep 18, 2012 at 12:33 PM, R. Sivaramakrishna Sharma <[email protected]> wrote: > Namaste, > > I am new to this group; just joined in. I recently downloaded > tesseract software and tried using the Hindi files provided. I am > amazed at the level of accuracy even with scans very old letterpress > printed books. > > Nevertheless, I have currently have many (100's) of old (pre-1940) > books in Tamil. And I can see that Tamil datafiles are not available. > > I am willing to contribute in this area (develop data/recognition > files for Tamil), if someone could guide me to the correct resources > (manuals etc. on building these {high-quality} datafiles). > > Looking forward to your response' > > Thank you. > > R. Sivaramakrishna > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

