OK, here the error that the Danish model generates:
Error: failed to insert pattern '\d\d.\d\d.\d\d'
Error: failed to insert pattern '\d\d:\d\d'

regards,
Karoly

On 4/8/20, Karoly Makonyi <[email protected]> wrote:
> Hello,
>
> I should read out fixed-format time and date from images.
> The task is rather trivial, but tesseract performs weirdly.
> I am using the Danish trained model. The format of the date string is
> dd.mm.yy the time is hh:mm.
> Very often the ':' in the time is recognized as '1', but this is not
> difficult to correct.
> In the date I experienced letter 'U' and 'O' instead of number '0' (this is
>
> neither very difficult to postprocess) and letter 'U' and 'H' instead of
> number '11'.
> This is harder ...
> The English pretrained model works - on the checked examples - perfectly
> (but I cant use it because the our embedded system has not enough memory).
> I can build whitelist of characters with numbers and separators only. The
> precision doesn't inclease too much ...
>
> Because of the format is fixed, I tried to use patterns: \d\d.\d\d.\d\d for
>
> the date and \d\d:\d\d for the time.
> With English model the pattern file is accepted and obviously is used, but
> the accuracy drops (starts to mismatch the ':' with '1', putting space
> between day, month and year ,,,)
> With the danish model I get error message (sorry I can't quote it (I am on
> an other computer), but it cant recognize the format of the regexp, or
> similar ...) with the _same_ pattern file.
>
> How the pattern file depend on the language?
> What other way one can imagine to improve my model ...
>
> I am _no_t using LSTM but tesseract 4.0.0 on linux.
>
> Thanks in advance,
> Karoly
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/def598f3-4e33-4d73-b3a5-9615192b3ff3%40googlegroups.com.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAHJCPnr%2Be1D_zrmjy7NbQH2QytQ1uzGRaeJR%2Bfio6ng%3D%3Dg7zRw%40mail.gmail.com.

Reply via email to