Hi,

I have 2 images pretty similar that I want to OCR.

[image: image_1758836719_box0_score0_87.jpg]
[image: image_1758836841_box0_score0_87.jpg]
I think they are both pretty good quality. To OCR the 2nd one I'm using
this command:
tesseract image_1758836841_box0_score0_87.jpg stdout --dpi 600 --psm 7 -l
eng

And I'm getting exactly what is in the picture.
However, the same command for the first picture doesn't return anything.

Now, if I change the command for this one:
tesseract image_1758836719_box0_score0_87.jpg stdout --dpi 600 -l eng

I'm getting some output with a lot of noise:
Detected 6 diacritics
— sl O

a e any aS |
Lightning Greaves

But for the Aurochs file I'm getting "Empty page!!". I have not been able
to get a command working for both.

So I have a few questions here.

   - Is there a way to say something like "try without PSM and if empty
   page try with psm 7"?
   - Is that possible to provide my own list of possible words to look for?
   Like, can I provide "Aurochs, Greaves, Lightning" and enforce the OCR to
   use only those possible words?


Thanks,

JM

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAPQV63Uuzf7%2Bro%3Dfi3ff_7cswa%3DjvMAA7nPaynSxP1ZVG_YQ2g%40mail.gmail.com.

Reply via email to