I've attached the files.

Here's the html content:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
>> "http://www.w3.org/TR/html4/loose.dtd";>
>> <html>
>> <head>
>> <title></title>
>> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
>> <meta name='ocr-system' content='tesseract'/>
>> </head>
>> <body>
>> <div class='ocr_page' id='page_1' title='image "c:/dev/g.jpg"; bbox 0 0 
>> 1008 630'>
>> <div class='ocr_carea' id='block_1_1' title="bbox 107 81 398 453">
>> <p class='ocr_par'>
>> <span class='ocr_line' id='line_1_1' title="bbox 107 81 398 136"><span 
>> class='ocr_word' id='word_1_1' title="bbox 107 81 262 136"><span 
>> class='ocrx_word' id='xword_1_1' title="x_wconf -1">apple,</span></span> 
>> <span class='ocr_word' id='word_1_2' title="bbox 281 83 398 125"><span 
>> class='ocrx_word' id='xword_1_2' title="x_wconf 
>> -1">train</span></span></span>
>> </p>
>> <p class='ocr_par'>
>> <span class='ocr_line' id='line_1_2' title="bbox 110 437 214 453"><span 
>> class='ocr_word' id='word_1_3' title="bbox 110 437 214 453"><span 
>> class='ocrx_word' id='xword_1_3' title="x_wconf 
>> -2">tesseract</span></span></span>
>> </p>
>> </div>
>> <div class='ocr_carea' id='block_1_2' title="bbox 445 333 503 358">
>> <p class='ocr_par'>
>> <span class='ocr_line' id='line_1_3' title="bbox 445 333 503 358"><span 
>> class='ocr_word' id='word_1_4' title="bbox 445 333 503 358"><span 
>> class='ocrx_word' id='xword_1_4' title="x_wconf 
>> -15"><strong><em>%</em></strong></span></span></span>
>> </p>
>> </div>
>> </div>
>> </body>
>> </html>
>>
>

2012. október 9., kedd 19:26:48 UTC+2 időpontban zdenop a következőt írta:
>
> Please provide input image and tesseract hocr output.
>
> --
> Zdenko
>
> On Mon, Oct 8, 2012 at 4:20 PM, Attila Somogyi 
> <[email protected]<javascript:>
> > wrote:
>
>>
>> <https://lh3.googleusercontent.com/-fUGeRmOieDE/UHLhJ8DV2MI/AAAAAAAAAPk/-_VkbAunDOU/s1600/bbox.jpg>
>> Hi!
>>
>> Im using 3.01. I use the html file to get the box informations(hocr 
>> config). It seems that all of the boxes are bigger than the actual words. 
>> Only the top and the left edges of the words match the bbox. Is that 
>> normal? Is there a way to fix this?
>>
>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>
>
> -- 
> Zdenko
>  

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

apple, train

tesseract

%

<<attachment: g.jpg>>

Reply via email to