[tesseract-ocr] why tesseract gives junk value for japanese language?

mahendrag gajera Thu, 12 Jul 2018 06:15:54 -0700

Hello all

I am try to ocr japanese images via below code. But it give junk character.
My tesseract version is 4.0


Please let me know what is missing here.

void Test(char* imagePath)
{
char *outText;

tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
// Initialize tesseract-ocr with English, without specifying tessdata path
if (api->Init("D:\\tessdata", "jpn", 
tesseract::OcrEngineMode::OEM_TESSERACT_ONLY))
{
fprintf(stderr, "Could not initialize tesseract.\n");
exit(1);
}

// Open input image with leptonica library
Pix *image = pixRead(imagePath);
api->SetImage(image);
// Get OCR result
outText = api->GetUTF8Text();
printf("OCR output:\n%s", outText);

// Destroy used object and release memory
api->End();
delete[] outText;
pixDestroy(&image);
}

Using train data from here

https://github.com/tesseract-ocr/tessdata

Test data image

<https://lh3.googleusercontent.com/-nn1FgPUWwZA/W0S_PJ_D8UI/AAAAAAAACaY/Y9Y6uByvN3kP1vN8tKFP8VMKlIwPIPwyACLcBGAs/s1600/japan4.png>

Thanks,

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7bfe8e31-91ea-491c-8e8c-61bdab47dff4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] why tesseract gives junk value for japanese language?

Reply via email to