I'd like to perform OCR on subimages that I am loading using OpenCV.
How can I convert this data into a form tesseract can work with?
So far I tried two different codes, but no version is working. I hope
someone here can help me finding a solution.
A.) converting cv::Mat into Pix*
cv::Mat image = cv::imread("c:/image.png");
cv::Mat subImage = image(cv::Rect(50, 200, 300, 100));
int depth;
if(subImage.depth() == CV_8U)
depth = 8;
//other cases not considered yet
PIX* pix = pixCreateHeader(subImage.size().width,
subImage.size().height, depth);
pix->data = (l_uint32*) subImage.data;
tesseract::TessBaseAPI tess;
STRING text;
if(tess.ProcessPage(pix, 0, 0, &text))
{
std::cout << text.string();
}
OCR returns only non-readable characters however.
B.) Using cv::Mat for TesseractRect()
cv::Mat image = cv::imread("c:/image.png");
cv::Mat subImage = image(cv::Rect(50, 200, 300, 100));
char* cr = tess.TesseractRect(
subImage.data,
subImage.channels(),
subImage.channels() * subImage.size().width,
0,
0,
subImage.size().width,
subImage.size().height);
This code doesn't either and also returns only non-readable
characters, although different ones than from the code above.
Does anyone know what the problem could be? cv::Mat stores pixel data
as an array of type uchar, so it should be fine to use in
TesseractRect without any conversion, as UINT8* are required.
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en