[tesseract-ocr] Re: Franken+ Released -- New Tool For Training Tesseract on Fonts from Page Images

sushma ms Mon, 01 Feb 2016 23:25:14 -0800

Hi Bryan,

I badly need you help in Tesseract-OCR,


I saw your below video and made to ask you few doubt,
https://www.youtube.com/watch?v=K1wyzuyOLdk

I wanted to know how to add the new trained data to the tessdata so that it 
can be activated by default,
no need to provide "-l lang"

ex: tesseract test.png test -l eng2

I used the below link and created the trained data,
http://www.joyofdata.de/blog/a-guide-on-ocr-with-tesseract-3-03/

now i want to add it to tessdata and make it execute the trained data by 
default, will you please let me know the steps how i can do it.

Thank you,
Sushma

On Saturday, December 7, 2013 at 1:40:56 AM UTC+5:30, matthew christy wrote:
>
> Hi All,
>
> The Initiative for Digital Humanities, Media, and Culture (IDHMC) at Texas 
> A&M University, as part of its Early Modern OCR Project (eMOP 
> <http://emop.tamu.edu/>) has created a new tool, called Franken+, that 
> provides a way to create font training for the Tesseract OCR engine using 
> page images. This is in contrast to Tesseract's documented method 
> <http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3> of font 
> training which involves using a word processing program with a modern font. 
> Franken+ has now been released for beta testing and we invite anyone who's 
> interested to give it a try and to please provide feedback.
>
> Franken+ works in conjunction with PRImA's open source Aletheia tool 
> <http://www.primaresearch.org/tools.php> and allows users to easily and 
> quickly identify one or more idealized forms of each glyph found on a set 
> of page images. These identified forms are then used to generate a set of 
> Franken-page images matching the page characteristics documented in 
> Tesseract's training instructions, but with a font used in an actual early 
> modern printed document. Franken+ allows you to create Tesseract box files, 
> but will also guide you through the entire Tesseract training process, 
> producing a .traneddata file, and even allow you to identify and OCR 
> documents using that training. In addition, Franken+ makes it easy to 
> combine training from multiple fonts into one training set.
>
> For eMOP we are using Franken+ to create training for Tesseract from page 
> images of early modern printed works, but we also think it can be used just 
> as effectively to train Tesseract using images of any kind of font that's 
> not readily available via a word processor. For example, I've seen posts in 
> this group about wanting to train Tesseract to read the signs on the front 
> of buses.
>
> You can find out more about Franken+ at http://emop.tamu.edu/node/54 and 
> http://dh-emopweb.tamu.edu/Franken+/. The code is also available open 
> source at https://github.com/idhmc-tamu/eMOP/tree/master/Franken%2B.
>
> Thanks,
> Matt Christy
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e36f4012-221f-40d2-961b-c8dd336c6818%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Franken+ Released -- New Tool For Training Tesseract on Fonts from Page Images

Reply via email to