I suggest you fine-tune Latin.traineddata using text of the kind you
expect. It will have a smaller unicharset and when you convert to fast
integer model, it should be smaller in size.

On Wed, Apr 8, 2020, 20:39 O CR <[email protected]> wrote:

> Hi all,
>
> I try to read names on images with tesseract LSTM. Names like:
>
> Śerena Kovitch
>
> ŁAGUNA EVREIST
>
> Äna Optici
>
> Orğu Moninck
>
>
> (I don't have to recognize words)
>
>
> Latin.traineddata (fast integer) is doing well with the diacritics, but
> there are a lot of characters I don't need like numbers, %, ﹕ ,﹖ ,﹗,﹙ ,﹚
> ,﹛ ,﹜ ,﹝ ,﹞ ,﹟ ,﹠ ,﹡ ,﹢ ,﹣ ,﹤,﹥,﹦ ,﹨ ,﹩ ﹪ ,﹫,and much more. And so
> Latin.traineddata is too slow.
>
> So I thought I take eng.traineddata (best float for LSTM) and I train it
> for the diacritics. But there are almost 400 diacritics. So I don't know if
> fine-tuning for such amount of characters is a good idea?
>
> However I tried it but the quality is very poor.
>
> I trained with eng.training_text (a English text of 72 lines) and I added
> all the diacritics several times. The char error rate during lstmeval is
> around 0.1. I did a test with 80 documents, and I read 30 names correct.
> (on each document there is one name). (time is similar to Latin.traineddata)
>
>
> What can I do to get a model that is as good as Latin.traineddata on
> diacritics but is much faster in ocr reading?
>
>
> Thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/b9ddf333-1229-45d3-9a02-809973294a47%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/b9ddf333-1229-45d3-9a02-809973294a47%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWYU7HAxPCS2rf%3D9Wizv4RA-t9XRnBrbJ%2BDVpfYnTVp1g%40mail.gmail.com.

Reply via email to