Thanks again for your reply

Yeah it seems page segmentation is the crucial issue. If the bounding boxes
are good, the recognition is usually very good.

I think I've sort of reached the limit on what I can do with base
Tesseract. I think the next step would be custom training / fine-tuning.

søn. 12. nov. 2023 kl. 01:52 skrev Tom Morris <tfmor...@gmail.com>:

>
>
> On Friday, November 10, 2023 at 3:03:42 AM UTC-5 olavs...@gmail.com wrote:
>
>
> It isn't clear to me if OSD is meant for orientation of the whole page or
> orientation of individual text elements on the page
>
>
> Sorry, I should have mentioned that earlier. I'm pretty sure it's page
> orientation and while I think it can handle vertical text, I don't think it
> can handle rotated text, so you'll probably have to run things twice.
>
>
> For example I would prefer it didn't include the CL symbol because that
> gave it a 0 confidence score, even though it did in fact recognize
> correctly.
>
>
> This may be difficult for cases where the CL symbol is very close in size
> to your digits, but you might be able to do something base on character
> confidence scores.
>
>
>  I just don't know how to optimize it with the right config variables.
>
>
> I think your biggest problem is probably page segmentation and that's one
> of Tesseract's weakest areas. I'm not sure how much tweaking parameters is
> going to help, but perhaps someone else has some ideas.
>
> Tom
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/c60cf545-4d52-4333-8790-4f2442fc517fn%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/c60cf545-4d52-4333-8790-4f2442fc517fn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CADVG04qozvgW5FzLsOFp5A%2BOT7PPAb10w0cu2SuX3D4jpzUMJg%40mail.gmail.com.

Reply via email to