try to have a look at this issue[1] - somebody sent there (python binding for tesseract) a patch to set OpenCV image directly to tesseract
Zdenko [1] http://code.google.com/p/python-tesseract/issues/detail?id=8 On Mon, Nov 14, 2011 at 7:33 PM, cyrt <[email protected]> wrote: > I'd like to perform OCR on subimages that I am loading using OpenCV. > How can I convert this data into a form tesseract can work with? > > So far I tried two different codes, but no version is working. I hope > someone here can help me finding a solution. > > A.) converting cv::Mat into Pix* > > cv::Mat image = cv::imread("c:/image.png"); > cv::Mat subImage = image(cv::Rect(50, 200, 300, 100)); > > int depth; > if(subImage.depth() == CV_8U) > depth = 8; > //other cases not considered yet > > PIX* pix = pixCreateHeader(subImage.size().width, > subImage.size().height, depth); > pix->data = (l_uint32*) subImage.data; > > tesseract::TessBaseAPI tess; > STRING text; > if(tess.ProcessPage(pix, 0, 0, &text)) > { > std::cout << text.string(); > } > > OCR returns only non-readable characters however. > > B.) Using cv::Mat for TesseractRect() > > cv::Mat image = cv::imread("c:/image.png"); > cv::Mat subImage = image(cv::Rect(50, 200, 300, 100)); > char* cr = tess.TesseractRect( > subImage.data, > subImage.channels(), > subImage.channels() * subImage.size().width, > 0, > 0, > subImage.size().width, > subImage.size().height); > > This code doesn't either and also returns only non-readable > characters, although different ones than from the code above. > > Does anyone know what the problem could be? cv::Mat stores pixel data > as an array of type uchar, so it should be fine to use in > TesseractRect without any conversion, as UINT8* are required. > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

