try to have a look at this issue[1] - somebody sent there (python binding
for tesseract) a patch to set OpenCV image directly to tesseract

Zdenko

[1]  http://code.google.com/p/python-tesseract/issues/detail?id=8

On Mon, Nov 14, 2011 at 7:33 PM, cyrt <[email protected]> wrote:

> I'd like to perform OCR on subimages that I am loading using OpenCV.
> How can I convert this data into a form tesseract can work with?
>
> So far I tried two different codes, but no version is working. I hope
> someone here can help me finding a solution.
>
> A.) converting cv::Mat into Pix*
>
> cv::Mat image = cv::imread("c:/image.png");
>  cv::Mat subImage = image(cv::Rect(50, 200, 300, 100));
>
>  int depth;
>  if(subImage.depth() == CV_8U)
>    depth = 8;
>  //other cases not considered yet
>
>  PIX* pix = pixCreateHeader(subImage.size().width,
> subImage.size().height, depth);
>  pix->data = (l_uint32*) subImage.data;
>
>  tesseract::TessBaseAPI tess;
>  STRING text;
>  if(tess.ProcessPage(pix, 0, 0, &text))
>  {
>    std::cout << text.string();
>  }
>
> OCR returns only non-readable characters however.
>
> B.) Using cv::Mat for TesseractRect()
>
> cv::Mat image = cv::imread("c:/image.png");
> cv::Mat subImage = image(cv::Rect(50, 200, 300, 100));
> char* cr = tess.TesseractRect(
>           subImage.data,
>           subImage.channels(),
>           subImage.channels() * subImage.size().width,
>           0,
>           0,
>           subImage.size().width,
>           subImage.size().height);
>
> This code doesn't either and also returns only non-readable
> characters, although different ones than from the code above.
>
> Does anyone know what the problem could be? cv::Mat stores pixel data
> as an array of type uchar, so it should be fine to use in
> TesseractRect without any conversion, as UINT8* are required.
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to