Not from inside Tesseract. Review of the API shows Tesseract does not 
expose a public method that enumerates the font names in a particular 
.trainneddata file. Therefore, you will have to visually inspect the 
glyphs, identify a matching font, install it if necessary, and manually 
substitute with the font in your program.

On Saturday, September 21, 2013 10:19:07 AM UTC-5, [email protected] wrote:
>
> When you say that I will need to "map the font returned by Tesseract to 
> some font available on your system that has similar glyph characteristics", 
> you have restated my original question.
>
> So maybe this rephrasing will help you understand my question:
>
> How can I map the font returned by Tesseract to some font available on 
> Windows that has similar glyph characteristics?
>
>
> On Saturday, September 21, 2013 9:07:11 AM UTC-6, Quan Nguyen wrote:
>>
>> I don't think Tesseract has any knowledge about system fonts. It gets the 
>> font info from the .traineddata file which includes information defined in 
>> the font_properties file used during training. So it means the fonts used 
>> in training may not exist on the machine it's being run on. Moreover, the 
>> font name specified in font_properties may not reflect the actual font 
>> name; e.g., "Times New Roman" may be shortened to "times".
>>
>> As such, you will need to map the font returned by Tesseract to some font 
>> available on your system that has similar glyph characteristics.
>>
>> On Saturday, September 21, 2013 7:39:04 AM UTC-5, [email protected] wrote:
>>>
>>> Thanks for the quick response, but I already know about those APIs - let 
>>> me try to explain with an example.
>>>
>>> Let's say that ResultIterator says that it found the word "hello" in the 
>>> image at position (100, 100), and TessResultIteratorWordFontAttributes says 
>>> it's in font "Arial" with a height of 16.  In my Windows application, I can 
>>> construct a 16-high Arial font and draw the word "hello" at (100, 100) and 
>>> I am doing a good job of showing the user the OCR output.
>>>
>>> But now let's say that ResultIterator continues and says that it found 
>>> the word "goodbye" in the image at position (100, 300), and 
>>> TessResultIteratorWordFontAttributes says it's in font "DejaVu Sans" with a 
>>> height of 16.  If I tell Windows to construct a font named "DejaVu Sans", 
>>> Window won't have any idea what that is, and it will pick some random font 
>>> from its list.  When I then have my Windows application draw the word 
>>> "goodbye" at (100, 300), it's highly likely that the character widths in 
>>> the font that Windows is using are very different from the character widths 
>>> in the actual DejaVu Sans font, so the word "goodbye" will take up the 
>>> wrong amount of space and I'll either end up with lots of white space or 
>>> (more often) the words all run over each other.
>>>
>>> Does that make more sense?
>>>
>>> Thanks,
>>> Chris
>>>
>>>
>>> On Friday, September 20, 2013 5:39:07 PM UTC-6, Quan Nguyen wrote:
>>>>
>>>> You'll need to access Tessearct API for such information, specifically, 
>>>> ResultIterator and ResultIteratorWordFontAttributes. Check out the API 
>>>> Example <http://code.google.com/p/tesseract-ocr/wiki/APIExample> page.
>>>>
>>>> Quan
>>>>
>>>>
>>>> On Friday, September 20, 2013 3:42:14 PM UTC-5, [email protected] wrote:
>>>>>
>>>>> I would like to show the user the OCR output in my Windows application 
>>>>> in a graphical form (the OCR'd characters, in the specified font, in the 
>>>>> right location), in order to do that I need to pick a font to draw the 
>>>>> OCR 
>>>>> output text in, and it seems like I have two choices -
>>>>> 1) Map the Tesseract font to something Windows can understand
>>>>> 2) Use the actual Tesseract font
>>>>>
>>>>> For #1, Tesseract uses a lot of fonts that I've got on my Windows box 
>>>>> (Times New Roman, Arial, etc.) but then it also comes up with some I 
>>>>> don't 
>>>>> have (Century Schoolbook).  Is there a way to enumerate all the names of 
>>>>> the fonts that Tesseract might return?  I can then decide whether it's 
>>>>> easier to find Windows equivalent for all the fonts, or to download fonts 
>>>>> (if they are free and have nice licensing).
>>>>>
>>>>> For #2, it's not enough to just display the selected portion of the 
>>>>> source image, that doesn't tell the user anything.  I would need a way to 
>>>>> ask Tesseract, "what is the glyph for an uppercase G in an Arial font of 
>>>>> height 34".  Does that exist?
>>>>>
>>>>> Thanks,
>>>>> Chris
>>>>>
>>>>>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to