Hi Wincent,

Thank you for sharing these tools. I find create-dictdata to be very useful.

I wanted to know if you have modified any ocr-evaluation tools to handle 
the high unicode range such as for Akkadian language.

I was trying to test regarding Modi script (*Range*‎: ‎U+11600..U+1165F; 
(96 code points)) and found that  `ocrevalutf8 accuracy` does not work well 
for it. Any suggestions ...

Shree

On Sunday, January 5, 2020 at 2:22:50 AM UTC+5:30, Wincent Balin wrote:
>
> Hi all,
>
> I would like to announce pytesstrain, a collection of Tesseract training 
> tools, as well as the underlying library. The tools were created while 
> training Tesseract to recognise Akkadian language (stay tuned for more 
> posts!), to solve the problems that emerged in the process.
>
> You can install it with pip install pytesstrain.
>
> The PyPI page for the package is https://pypi.org/project/pytesstrain/. 
> The GitHub project page is https://github.com/wincentbalin/pytesstrain.
>
> This package contains the tools to create dictionary data (wordlist, bi- 
> and unigram lists, etc.), rewrap lines in text files to the specified 
> length, collect most frequent recognition errors and dump them into 
> unicharambigs file, and to perform recognition metrics (WER and CER). It 
> also contains the run_test() function, which creates an image file from 
> the given string and performs OCR on it afterwards, as well as its 
> parallelised version, run_tests(), which can be used in future tools.
>
> Feedback, suggestions, etc would be most welcome.
>
> Yours truly,
>
> Wincent
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/22d65439-54f1-4628-9c04-d7a35777b950%40googlegroups.com.

Reply via email to