Not from inside Tesseract. Review of the API shows Tesseract does not expose a public method that enumerates the font names in a particular .trainneddata file. Therefore, you will have to visually inspect the glyphs, identify a matching font, install it if necessary, and manually substitute with the font in your program.
On Saturday, September 21, 2013 10:19:07 AM UTC-5, [email protected] wrote: > > When you say that I will need to "map the font returned by Tesseract to > some font available on your system that has similar glyph characteristics", > you have restated my original question. > > So maybe this rephrasing will help you understand my question: > > How can I map the font returned by Tesseract to some font available on > Windows that has similar glyph characteristics? > > > On Saturday, September 21, 2013 9:07:11 AM UTC-6, Quan Nguyen wrote: >> >> I don't think Tesseract has any knowledge about system fonts. It gets the >> font info from the .traineddata file which includes information defined in >> the font_properties file used during training. So it means the fonts used >> in training may not exist on the machine it's being run on. Moreover, the >> font name specified in font_properties may not reflect the actual font >> name; e.g., "Times New Roman" may be shortened to "times". >> >> As such, you will need to map the font returned by Tesseract to some font >> available on your system that has similar glyph characteristics. >> >> On Saturday, September 21, 2013 7:39:04 AM UTC-5, [email protected] wrote: >>> >>> Thanks for the quick response, but I already know about those APIs - let >>> me try to explain with an example. >>> >>> Let's say that ResultIterator says that it found the word "hello" in the >>> image at position (100, 100), and TessResultIteratorWordFontAttributes says >>> it's in font "Arial" with a height of 16. In my Windows application, I can >>> construct a 16-high Arial font and draw the word "hello" at (100, 100) and >>> I am doing a good job of showing the user the OCR output. >>> >>> But now let's say that ResultIterator continues and says that it found >>> the word "goodbye" in the image at position (100, 300), and >>> TessResultIteratorWordFontAttributes says it's in font "DejaVu Sans" with a >>> height of 16. If I tell Windows to construct a font named "DejaVu Sans", >>> Window won't have any idea what that is, and it will pick some random font >>> from its list. When I then have my Windows application draw the word >>> "goodbye" at (100, 300), it's highly likely that the character widths in >>> the font that Windows is using are very different from the character widths >>> in the actual DejaVu Sans font, so the word "goodbye" will take up the >>> wrong amount of space and I'll either end up with lots of white space or >>> (more often) the words all run over each other. >>> >>> Does that make more sense? >>> >>> Thanks, >>> Chris >>> >>> >>> On Friday, September 20, 2013 5:39:07 PM UTC-6, Quan Nguyen wrote: >>>> >>>> You'll need to access Tessearct API for such information, specifically, >>>> ResultIterator and ResultIteratorWordFontAttributes. Check out the API >>>> Example <http://code.google.com/p/tesseract-ocr/wiki/APIExample> page. >>>> >>>> Quan >>>> >>>> >>>> On Friday, September 20, 2013 3:42:14 PM UTC-5, [email protected] wrote: >>>>> >>>>> I would like to show the user the OCR output in my Windows application >>>>> in a graphical form (the OCR'd characters, in the specified font, in the >>>>> right location), in order to do that I need to pick a font to draw the >>>>> OCR >>>>> output text in, and it seems like I have two choices - >>>>> 1) Map the Tesseract font to something Windows can understand >>>>> 2) Use the actual Tesseract font >>>>> >>>>> For #1, Tesseract uses a lot of fonts that I've got on my Windows box >>>>> (Times New Roman, Arial, etc.) but then it also comes up with some I >>>>> don't >>>>> have (Century Schoolbook). Is there a way to enumerate all the names of >>>>> the fonts that Tesseract might return? I can then decide whether it's >>>>> easier to find Windows equivalent for all the fonts, or to download fonts >>>>> (if they are free and have nice licensing). >>>>> >>>>> For #2, it's not enough to just display the selected portion of the >>>>> source image, that doesn't tell the user anything. I would need a way to >>>>> ask Tesseract, "what is the glyph for an uppercase G in an Arial font of >>>>> height 34". Does that exist? >>>>> >>>>> Thanks, >>>>> Chris >>>>> >>>>> -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

