Re: [tesseract-ocr] Tesseract not recognizing ancient language's code

2020-03-22 Thread Wincent Balin
training text from a range of characters/word > list, similar to > > The tool language_metrics runs Tesseract OCR over images of random word >> sequences, which are created out of the supplied wordlist, > > > On Mon, Mar 16, 2020 at 2:32 AM Wincent Balin > wrote: > &

Re: [tesseract-ocr] Tesseract not recognizing ancient language's code

2020-03-15 Thread Wincent Balin
Maybe http://dasi.cnr.it does have something usable? Shree Devi Kumar schrieb am So., 15. März 2020, 16:55: > There is no online corpus for xsa that I could find. > > Two of the fonts you sent are legacy fonts, that is they map English > letters to ancient Arabic characters. > > Are there any co

Re: [tesseract-ocr] Help for training Akkadian language for Tesseract 4 needed

2020-02-22 Thread Wincent Balin
wrong with this approach? Am Montag, 17. Februar 2020 08:23:38 UTC+1 schrieb shree: > > Try lstmtraining again for 1000 iterations with --debug_level -1 > > > > > On Mon, Feb 17, 2020, 01:46 Wincent Balin > wrote: > >> Hello all, >> >> after prepa

[tesseract-ocr] Help for training Akkadian language for Tesseract 4 needed

2020-02-16 Thread Wincent Balin
Hello all, after preparing ground truth files for Akkadian language, I started the training using the *tesstrain *Makefile, but over 400 iterations later, the output is like following: At iteration 4437804/4478900/4478900, Mean rms=1.453%, delta=9.455%, char train=121.423%, word train=87.4

Re: [tesseract-ocr] Re: Announcement: Python package pytesstrain (Tesseract training helpers)

2020-02-09 Thread Wincent Balin
Hello Shree, I just uploaded new version of the package. About the fixes: 1. --fonts_dir: I added the default value of the fonts directory on different platforms. 2. Amount of threads: I also capped the maximal amount of threads to the number of CPUs. Would you like to re-test it, please?

[tesseract-ocr] Re: Announcement: Python package pytesstrain (Tesseract training helpers)

2020-02-03 Thread Wincent Balin
t (*Range*‎: ‎U+11600..U+1165F; > (96 code points)) and found that `ocrevalutf8 accuracy` does not work > well for it. Any suggestions ... > > Shree > > On Sunday, January 5, 2020 at 2:22:50 AM UTC+5:30, Wincent Balin wrote: >> >> Hi all, >> >> I would like t

Re: [tesseract-ocr] Announcement: Python package pytesstrain (Tesseract training helpers)

2020-02-03 Thread Wincent Balin
t; > On Sun, Jan 5, 2020, 02:22 Wincent Balin > wrote: > >> Hi all, >> >> I would like to announce pytesstrain, a collection of Tesseract training >> tools, as well as the underlying library. The tools were created while >> training Tesseract to recogni

[tesseract-ocr] Announcement: Python package pytesstrain (Tesseract training helpers)

2020-01-04 Thread Wincent Balin
Hi all, I would like to announce pytesstrain, a collection of Tesseract training tools, as well as the underlying library. The tools were created while training Tesseract to recognise Akkadian language (stay tuned for more posts!), to solve the problems that emerged in the process. You can ins