Hi all,

there is a request[1] to put back information about word confidence(x_wconf)
to hocr output[2] (There has been changes in 3.02 version, and x_wconf was
removed).

I want to make it according hOCR spec[3], but I am not sure if I got it
right. I tried to contact Thomas Breuel (editor of hOCR spec) but he did
not responded (yet).
I tried to check cuneiform-linux (1.1.0 ) and ocropus (0.6) output, but
they did not provide word confidence information. So I tried to
implement it(see patch at issue 748) to the best of my knowledge.

There 2 changes comparing 3.01 hocr output:

   1. x_wconf is not "small negative amount" (As far as I saw from 0 to
   -7), but integer from 0 to 100(%)
   2. x_wconf is included to title of class='ocrx_word' together with bbox
   info

I would like to know if:

   - somebody has better idea/understanding of hOCR spec how to
   implement x_wconf
   - it did not break some tools (and how to fix it)

I attached hocr output for phototest.tif from about mentioned tools for
comparison.

Thanks for your feedback.

[1] http://code.google.com/p/tesseract-ocr/issues/detail?id=748
[2] http://en.wikipedia.org/wiki/HOCR
[3] http://docs.google.com/View?docid=dfxcv4vc_67g844kf

-- 
Zdenko

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.

The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.

Title: OCR Results
Thos ns a lot of12 polnt text to test the ocr code and see ifiworks on all types of hle format The qulck brown dog jumped over the azy fox. The quick brown dog pumped over the lazy fox. The qunck brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox

This is a lot of 12 point text to test the ocr code and see if it works on all types of file format.

The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox.

This is a lot of 12 point text to test the ocr code and see if it works on all types of file format.

The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox.

Reply via email to