Hi All,
Currently am doing OCR line by line and getting words details from
ResultIterator like below
tessAPI->SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_LINE);
tessAPI->SetRectangle(iXmin, iYmin, iW, iH); //these line boxes are being
calculated by our pre-processing and segmentation code)
tessAPI->Recognize(nullptr);
tesseract::ResultIterator* rst_iter = tessAPI->GetIterator();
tesseract::PageIteratorLevel level = tesseract::RIL_WORD;
if (nullptr != rst_iter)
{
do
{
const char* text = rst_iter->GetUTF8Text(level);
rst_iter->WordFontAttributes(&is_bold, &is_italic,
&is_underlined, &is_monospace, &is_serif, &is_smallcaps, &pointsize,
&font_id);
//here I want to get the line & para of the current word
belongs to from tess API
} while (rst_iter->Next(level));
}
I can get paras/lines/words using tessAPI->GetComponentImages() function,
but for words only can get block/paras only. Somehow I am mapping those
words with lines, but still getting some garbage.
Is there any way to get the line & para of the current word belongs to?
Thanks in advance,
Lakshman.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/c3e96d5a-0260-4f8b-9269-829128052b96n%40googlegroups.com.