Thanks again for your reply Yeah it seems page segmentation is the crucial issue. If the bounding boxes are good, the recognition is usually very good.
I think I've sort of reached the limit on what I can do with base Tesseract. I think the next step would be custom training / fine-tuning. søn. 12. nov. 2023 kl. 01:52 skrev Tom Morris <tfmor...@gmail.com>: > > > On Friday, November 10, 2023 at 3:03:42 AM UTC-5 olavs...@gmail.com wrote: > > > It isn't clear to me if OSD is meant for orientation of the whole page or > orientation of individual text elements on the page > > > Sorry, I should have mentioned that earlier. I'm pretty sure it's page > orientation and while I think it can handle vertical text, I don't think it > can handle rotated text, so you'll probably have to run things twice. > > > For example I would prefer it didn't include the CL symbol because that > gave it a 0 confidence score, even though it did in fact recognize > correctly. > > > This may be difficult for cases where the CL symbol is very close in size > to your digits, but you might be able to do something base on character > confidence scores. > > > I just don't know how to optimize it with the right config variables. > > > I think your biggest problem is probably page segmentation and that's one > of Tesseract's weakest areas. I'm not sure how much tweaking parameters is > going to help, but perhaps someone else has some ideas. > > Tom > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/c60cf545-4d52-4333-8790-4f2442fc517fn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/c60cf545-4d52-4333-8790-4f2442fc517fn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CADVG04qozvgW5FzLsOFp5A%2BOT7PPAb10w0cu2SuX3D4jpzUMJg%40mail.gmail.com.