[tesseract-ocr] How to extract bounding box only? If I do not need the word/characters classifier.

jinhuili Wed, 28 Oct 2015 01:46:25 -0700

Hi,

First, I have very little knowledge about ocr/tesseract.


We use tesseract ocr to detect text area of a given image, which is used 
for calculating image quality(the smaller text area ratio the better). We 
don't use the content result of ocr, only use bounding boxes of words. 

And the problems is, there are cases that there are a lot of Chinese or 
Russia characters in images. It often takes more than 20 seconds, which is 
unacceptable. As a online interactive service, we can not let the user, our 
customers, wait too long. 

Are there some parameters I can tweak for speed up OCR? If we only need the 
text boxes area. Or I just call method to do "perform page layout analysis" 
?
Assume the text in image are rarely rotated. Images are from customers' 
website, the readability is not bad.

Please help.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/75107d28-ff98-475c-aa5a-ef9aa52fc915%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] How to extract bounding box only? If I do not need the word/characters classifier.

Reply via email to