I've attached the files. Here's the html content:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" >> "http://www.w3.org/TR/html4/loose.dtd"> >> <html> >> <head> >> <title></title> >> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> >> <meta name='ocr-system' content='tesseract'/> >> </head> >> <body> >> <div class='ocr_page' id='page_1' title='image "c:/dev/g.jpg"; bbox 0 0 >> 1008 630'> >> <div class='ocr_carea' id='block_1_1' title="bbox 107 81 398 453"> >> <p class='ocr_par'> >> <span class='ocr_line' id='line_1_1' title="bbox 107 81 398 136"><span >> class='ocr_word' id='word_1_1' title="bbox 107 81 262 136"><span >> class='ocrx_word' id='xword_1_1' title="x_wconf -1">apple,</span></span> >> <span class='ocr_word' id='word_1_2' title="bbox 281 83 398 125"><span >> class='ocrx_word' id='xword_1_2' title="x_wconf >> -1">train</span></span></span> >> </p> >> <p class='ocr_par'> >> <span class='ocr_line' id='line_1_2' title="bbox 110 437 214 453"><span >> class='ocr_word' id='word_1_3' title="bbox 110 437 214 453"><span >> class='ocrx_word' id='xword_1_3' title="x_wconf >> -2">tesseract</span></span></span> >> </p> >> </div> >> <div class='ocr_carea' id='block_1_2' title="bbox 445 333 503 358"> >> <p class='ocr_par'> >> <span class='ocr_line' id='line_1_3' title="bbox 445 333 503 358"><span >> class='ocr_word' id='word_1_4' title="bbox 445 333 503 358"><span >> class='ocrx_word' id='xword_1_4' title="x_wconf >> -15"><strong><em>%</em></strong></span></span></span> >> </p> >> </div> >> </div> >> </body> >> </html> >> > 2012. október 9., kedd 19:26:48 UTC+2 időpontban zdenop a következőt írta: > > Please provide input image and tesseract hocr output. > > -- > Zdenko > > On Mon, Oct 8, 2012 at 4:20 PM, Attila Somogyi > <[email protected]<javascript:> > > wrote: > >> >> <https://lh3.googleusercontent.com/-fUGeRmOieDE/UHLhJ8DV2MI/AAAAAAAAAPk/-_VkbAunDOU/s1600/bbox.jpg> >> Hi! >> >> Im using 3.01. I use the html file to get the box informations(hocr >> config). It seems that all of the boxes are bigger than the actual words. >> Only the top and the left edges of the words match the bbox. Is that >> normal? Is there a way to fix this? >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected]<javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > > > -- > Zdenko > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en
apple, train
tesseract
%
<<attachment: g.jpg>>

