Thanks, Wincent.
I will try out the tools added by you.

I found a Unicode version of the ISRI evaluation tools at
https://github.com/eddieantonio/ocreval which handles the high range
Unicodepoints also. See
https://github.com/Shreeshrii/tesstrain-modi/blob/master/reports/modi-eval-modiLayer_1.017_157724_324000/report_modiLayer_1.017_157724_324000-modi-ALL.txt
for an example

Do you have a workflow for tesseract training using your tools? If so, I
would like to add/refer to it in Tesseract documentation.




On Tue, Feb 4, 2020 at 2:06 AM Wincent Balin <wincent.ba...@gmail.com>
wrote:

> Hi Shree,
>
> I am glad you find the package already useful :-) .
>
> As to your question: I did not use the ocr-evaluation tools, only the
> language_metrics utility. So, regrettably, I cannot help you here. But
> maybe you could try the same utility too?
>
> By the way, I added a create_ground_truth utility, which creates .gt.txt
> files as well as the associated .tif files for every specified font, to
> the package. I think it could be useful for anyone who does not have a
> ground truth collection yet.
>
> Kind regards,
>
> Wincent
>
>
> Am Mittwoch, 29. Januar 2020 06:47:01 UTC+1 schrieb shree:
>>
>> Hi Wincent,
>>
>> Thank you for sharing these tools. I find create-dictdata to be very
>> useful.
>>
>> I wanted to know if you have modified any ocr-evaluation tools to handle
>> the high unicode range such as for Akkadian language.
>>
>> I was trying to test regarding Modi script (*Range*‎: ‎U+11600..U+1165F;
>> (96 code points)) and found that  `ocrevalutf8 accuracy` does not work
>> well for it. Any suggestions ...
>>
>> Shree
>>
>> On Sunday, January 5, 2020 at 2:22:50 AM UTC+5:30, Wincent Balin wrote:
>>>
>>> Hi all,
>>>
>>> I would like to announce pytesstrain, a collection of Tesseract
>>> training tools, as well as the underlying library. The tools were created
>>> while training Tesseract to recognise Akkadian language (stay tuned for
>>> more posts!), to solve the problems that emerged in the process.
>>>
>>> You can install it with pip install pytesstrain.
>>>
>>> The PyPI page for the package is https://pypi.org/project/pytesstrain/.
>>> The GitHub project page is https://github.com/wincentbalin/pytesstrain.
>>>
>>> This package contains the tools to create dictionary data (wordlist, bi-
>>> and unigram lists, etc.), rewrap lines in text files to the specified
>>> length, collect most frequent recognition errors and dump them into
>>> unicharambigs file, and to perform recognition metrics (WER and CER). It
>>> also contains the run_test() function, which creates an image file from
>>> the given string and performs OCR on it afterwards, as well as its
>>> parallelised version, run_tests(), which can be used in future tools.
>>>
>>> Feedback, suggestions, etc would be most welcome.
>>>
>>> Yours truly,
>>>
>>> Wincent
>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/3df5801b-7119-4451-9bb5-5fabc3e66bb1%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/3df5801b-7119-4451-9bb5-5fabc3e66bb1%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU-Xyj4bU3-aw%3DjVP9%3DTvm5uPjLDuFesC4G%2B6nx6JM4Ug%40mail.gmail.com.

Reply via email to