OK, here the error that the Danish model generates: Error: failed to insert pattern '\d\d.\d\d.\d\d' Error: failed to insert pattern '\d\d:\d\d'
regards, Karoly On 4/8/20, Karoly Makonyi <[email protected]> wrote: > Hello, > > I should read out fixed-format time and date from images. > The task is rather trivial, but tesseract performs weirdly. > I am using the Danish trained model. The format of the date string is > dd.mm.yy the time is hh:mm. > Very often the ':' in the time is recognized as '1', but this is not > difficult to correct. > In the date I experienced letter 'U' and 'O' instead of number '0' (this is > > neither very difficult to postprocess) and letter 'U' and 'H' instead of > number '11'. > This is harder ... > The English pretrained model works - on the checked examples - perfectly > (but I cant use it because the our embedded system has not enough memory). > I can build whitelist of characters with numbers and separators only. The > precision doesn't inclease too much ... > > Because of the format is fixed, I tried to use patterns: \d\d.\d\d.\d\d for > > the date and \d\d:\d\d for the time. > With English model the pattern file is accepted and obviously is used, but > the accuracy drops (starts to mismatch the ':' with '1', putting space > between day, month and year ,,,) > With the danish model I get error message (sorry I can't quote it (I am on > an other computer), but it cant recognize the format of the regexp, or > similar ...) with the _same_ pattern file. > > How the pattern file depend on the language? > What other way one can imagine to improve my model ... > > I am _no_t using LSTM but tesseract 4.0.0 on linux. > > Thanks in advance, > Karoly > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/def598f3-4e33-4d73-b3a5-9615192b3ff3%40googlegroups.com. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAHJCPnr%2Be1D_zrmjy7NbQH2QytQ1uzGRaeJR%2Bfio6ng%3D%3Dg7zRw%40mail.gmail.com.

