Hi Zdenop,
Apologies. I got your name wrong in the thread.
Can you please help me in resolving this issue? Because make training
command was not creating the all-gt file. I manually created it and kept it
at the MODEL_NAME directory.
The way I created it was by copy over all the single lines from the text
files and storing it in the all-gt file. I am not sure if this is the right
approach. Please correct me if I am wrong here.
Now after doing this, i am getting this error:
python3 shuffle.py 0 "data/Apex/all-lstmf"
Traceback (most recent call last):
File "/Users/madpande/Code/git/tesseract_tutorial/tesstrain/shuffle.py",
line 24, in <module>
fd0 = open(sys.argv[2], 'r')
FileNotFoundError: [Errno 2] No such file or directory:
'data/Apex/all-lstmf'
I am pretty sure I am missing something here. Please help!
Thanks!
On Thursday, 1 June 2023 at 23:39:01 UTC-6 Madhav Pandey wrote:
> Hi Zdenko,
>
> At what step in the make file the all-gt file is created? I am still
> unable to move forward with the custom model training.
>
> Any help would be greatly appreciated. Thanks!
>
> On Wednesday, 26 April 2023 at 09:47:55 UTC-6 zdenop wrote:
>
>> make training TESSDATA=./usr/local/share/tessdata
>> unicharset_extractor --output_unicharset "data/foo/unicharset"
>> --norm_mode 2 "data/foo/all-gt"
>>
>> Failed to read data from: data/foo/all-gt....
>>
>>
>> This indicates you already run training that failed...
>> Clean your training and start it once again. Pay attention to why
>> "data/foo/all-gt" is not created (there will be an error message).
>>
>> Zdenko
>>
>>
>> st 26. 4. 2023 o 2:07 Madhav Pandey <[email protected]> napísal(a):
>>
>>> @zdenop
>>>
>>> This is the entire training output:
>>>
>>> ```make training TESSDATA=./usr/local/share/tessdata
>>> unicharset_extractor --output_unicharset "data/foo/unicharset"
>>> --norm_mode 2 "data/foo/all-gt"
>>> Failed to read data from: data/foo/all-gt
>>> Wrote unicharset file data/foo/unicharset
>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i
>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif" -t
>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.gt.txt" >
>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.box"
>>> set -x; \
>>> tesseract
>>> "data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif"
>>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train
>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0087_027.tif
>>> data/foo-ground-truth/alexis_ruhe01_1852_0087_027 --psm 13 lstm.train
>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i
>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif" -t
>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.gt.txt" >
>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.box"
>>> set -x; \
>>> tesseract
>>> "data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif"
>>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train
>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0018_022.tif
>>> data/foo-ground-truth/alexis_ruhe01_1852_0018_022 --psm 13 lstm.train
>>> PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i
>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif" -t
>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.gt.txt" >
>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.box"
>>> set -x; \
>>> tesseract
>>> "data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif"
>>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train
>>> + tesseract data/foo-ground-truth/alexis_ruhe01_1852_0035_019.tif
>>> data/foo-ground-truth/alexis_ruhe01_1852_0035_019 --psm 13 lstm.train
>>> python3 shuffle.py 0 "data/foo/all-lstmf"
>>> Traceback (most recent call last):
>>> File "/Users/m/Code/git/tesstrain/shuffle.py", line 24, in <module>
>>> fd0 = open(sys.argv[2], 'r')
>>> FileNotFoundError: [Errno 2] No such file or directory:
>>> 'data/foo/all-lstmf'
>>> make: *** [data/foo/all-lstmf] Error 1```
>>>
>>> For this run, I just have 3 text and tif files.
>>>
>>> I did follow macos installation section from this page:
>>> https://tesseract-ocr.github.io/tessdoc/Compiling.html#macos and
>>> installed everything that is mentioned here.
>>>
>>> Do I have to install anything else before running the training?
>>>
>>> On Tuesday, 25 April 2023 at 00:27:28 UTC-6 zdenop wrote:
>>>
>>>> Did you install all the necessary dependencies?
>>>> Did you check & fixed all errors (before this error) in training output?
>>>>
>>>> Zdenko
>>>>
>>>>
>>>> ut 25. 4. 2023 o 8:21 Madhav Pandey <[email protected]> napísal(a):
>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> I am relatively new to tesseract and OCR as whole.
>>>>>
>>>>> I have been trying to training do the setup for training model locally
>>>>> using the guide
>>>>> https://github.com/tesseract-ocr/tesstrain/blob/main/README.md
>>>>>
>>>>> I have copied the sample training data into the `data/foo` directory
>>>>> but when I run `make training`, I will always end up getting this error:
>>>>>
>>>>> ```Failed to read data from: data/foo/all-gt
>>>>> Wrote unicharset file data/foo/unicharset
>>>>> python3 shuffle.py 0 "data/foo/all-lstmf"
>>>>> Traceback (most recent call last):
>>>>> File "shuffle.py", line 24, in <module>
>>>>> fd0 = open(sys.argv[2], 'r')
>>>>> FileNotFoundError: [Errno 2] No such file or directory:
>>>>> 'data/foo/all-lstmf'
>>>>> make: *** [data/foo/all-lstmf] Error 1
>>>>> ```
>>>>>
>>>>> Can someone please help resolve this error?
>>>>>
>>>>> Thank you!
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com
>>>>>
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/249216fc-70e5-4e40-a630-d4202fd24a36n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com
>>>
>>> <https://groups.google.com/d/msgid/tesseract-ocr/98ffe203-7d53-4b57-a5e8-3edd3ae271cen%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/d044535b-ef13-4e07-8c1f-3cbab7098883n%40googlegroups.com.