Re: [tesseract-ocr] Why do I get such poor results from Tesseract for simple single character recognizing?

Lorenzo Bolzani Mon, 15 Oct 2018 14:32:48 -0700

Try to use psm 7 or 13 (SINGLE_LINE and RAW_LINE). In my case 7 works best.


I'm not 100% sure but it may be easier to recognize full words rather than
single characters. But I do not know if this is just a test or if this is
what you need to do.

The default oem mode (lstm) should be the best, but you may also try the
old one and see what works best in this case.

You can train (fine tune) the lstm models but it is not mandatory.


Bye

Lorenzo


Il giorno lun 15 ott 2018 alle ore 22:44 'Yuliana Zigangirova' via
tesseract-ocr <[email protected]> ha scritto:

> Hi everyone,
>
> I am trying to use Tesseract  for single character recognizing and the
> results are awful.
> "h" is recognized as "n",  "4" as "/i",  "O" as "()";
>
> [image: 1testtiff.png]
>
> [image: 6testtiff.png]
>
>
> [image: 2testtiff.png]
>
>
>
> Single character mode seems not to act, as many characters are recognized
> as two characters,
> not  just one. My images are  simple bilevel black and white TIFF images,
> latin characters.  This is bitmap font, not scanned images, they are
> absolutely clean and
> need no improvement.
> Оnly about half of the characters are correctly recognized, which seems to
> be
> a very low percent for such a simple task.
>
>  The library Tesseract version I am using is  "4.0.0-beta.3".
> This is how I call Tesseract.
>
>  int CharRecognizer::recognizeTIFFData(char* data, int datalength){
>             char *outText;
>             TessBaseAPI* api = new TessBaseAPI();
>             // Initialize tesseract-ocr with English, without specifying
> tessdata path
>             if (api->Init(NULL, "deu")) {
>                     fprintf(stderr, "Could not initialize tesseract.\n");
>                     exit(1);
>             }
>             api->SetPageSegMode(tesseract::PSM_SINGLE_CHAR);
>             Pix *image = pixReadMem(data,datalength);
>             api->SetImage(image);
>             // Get OCR result
>             outText = api->GetUTF8Text();
>             printf("\nOCR output:\n%s", outText);
>             // Destroy used object and release memory
>             int utf8 = outText[0];
>             api->End();
>             delete[] outText;
>             pixDestroy(&image);
>             return utf8;
>  }
>
>  I am new to Tesseract, so probably I am missing something.  Do I have to
> somehow train
>  the library first?  May be I should set another  OcrEngineMode?  I have
> expected no
>  problems  with simple  bitmap font recognizing and am quite at lost now.
> Thank you very much in advance,
> Yuliana
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/f3cbddee-f620-4479-a967-97b52c98c64c%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/f3cbddee-f620-4479-a967-97b52c98c64c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLxOhf%2BzAa014drQUVB0EqO%3D6eU5MiQkscPnsRYoYUdgmQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Why do I get such poor results from Tesseract for simple single character recognizing?

Reply via email to