Hi Bryan, I badly need you help in Tesseract-OCR,
I saw your below video and made to ask you few doubt, https://www.youtube.com/watch?v=K1wyzuyOLdk I wanted to know how to add the new trained data to the tessdata so that it can be activated by default, no need to provide "-l lang" ex: tesseract test.png test -l eng2 I used the below link and created the trained data, http://www.joyofdata.de/blog/a-guide-on-ocr-with-tesseract-3-03/ now i want to add it to tessdata and make it execute the trained data by default, will you please let me know the steps how i can do it. Thank you, Sushma On Saturday, December 7, 2013 at 1:40:56 AM UTC+5:30, matthew christy wrote: > > Hi All, > > The Initiative for Digital Humanities, Media, and Culture (IDHMC) at Texas > A&M University, as part of its Early Modern OCR Project (eMOP > <http://emop.tamu.edu/>) has created a new tool, called Franken+, that > provides a way to create font training for the Tesseract OCR engine using > page images. This is in contrast to Tesseract's documented method > <http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3> of font > training which involves using a word processing program with a modern font. > Franken+ has now been released for beta testing and we invite anyone who's > interested to give it a try and to please provide feedback. > > Franken+ works in conjunction with PRImA's open source Aletheia tool > <http://www.primaresearch.org/tools.php> and allows users to easily and > quickly identify one or more idealized forms of each glyph found on a set > of page images. These identified forms are then used to generate a set of > Franken-page images matching the page characteristics documented in > Tesseract's training instructions, but with a font used in an actual early > modern printed document. Franken+ allows you to create Tesseract box files, > but will also guide you through the entire Tesseract training process, > producing a .traneddata file, and even allow you to identify and OCR > documents using that training. In addition, Franken+ makes it easy to > combine training from multiple fonts into one training set. > > For eMOP we are using Franken+ to create training for Tesseract from page > images of early modern printed works, but we also think it can be used just > as effectively to train Tesseract using images of any kind of font that's > not readily available via a word processor. For example, I've seen posts in > this group about wanting to train Tesseract to read the signs on the front > of buses. > > You can find out more about Franken+ at http://emop.tamu.edu/node/54 and > http://dh-emopweb.tamu.edu/Franken+/. The code is also available open > source at https://github.com/idhmc-tamu/eMOP/tree/master/Franken%2B. > > Thanks, > Matt Christy > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e36f4012-221f-40d2-961b-c8dd336c6818%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

