I use mobaxterm and WSL (bash under windows) on Windows 10. If you are training for legacy tesseract engine (not LSTM) you can use Jtessboxeditor for training.
ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Feb 23, 2018 at 7:00 PM, Jehan <[email protected]> wrote: > Again, thank you for posting it earlier than me :) > > Anyway, do you know how could I pass this problem ? Is there any trick > that could help me ? Maybe using Git bash or something ? > > Le vendredi 23 février 2018 12:04:53 UTC+1, shree a écrit : >> >> Please open this as an issue in github repo - >> https://github.com/tesseract-ocr/tesseract/issues >> >> > the "/" is added without taking care if the command is used on >> Windows or Linux. >> >> Found a couple of places in that file where this is the case. >> >> // Load the unicharset for the script if available. >> string filename = script_dir + "/" + >> unicharset->get_script_from_script_id(s) + >> ".unicharset"; >> >> and >> >> // Load the xheights for the script if available. >> string filename = script_dir + "/" + >> unicharset.get_script_from_script_id(s) + >> ".xheights"; >> >> >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Fri, Feb 23, 2018 at 2:25 PM, Jehan <[email protected]> wrote: >> >>> I'm training Tesseract on Windows for a new font and everything went >>> pretty well until the set_unicharset_properties command step: >>> >>> set_unicharset_properties -U .\unicharset -O .\unicharset2 -F >>> "C:\Windows\Fonts\Roman.tff" --script_dir='C:\Program Files >>> (x86)\Tesseract-OCR\training' >>> >>> Loaded unicharset of size 7 from file .\unicharset >>>> Setting unichar properties >>>> Other case c of C is not in unicharset >>>> Other case f of F is not in unicharset >>>> Setting script properties >>>> Failed to load script unicharset from:C:\Program Files >>>> (x86)\Tesseract-OCR\training/Latin.unicharset >>>> Warning: properties incomplete for index 3 = C >>>> Warning: properties incomplete for index 4 = 0 >>>> Warning: properties incomplete for index 5 = 1 >>>> Warning: properties incomplete for index 6 = F >>>> Writing unicharset to file .\unicharset2 >>> >>> >>> I've verified that Latin.unicharset is in the right directory. >>> >>> The problem (I'm pretty sure) is on the end of this line : >>> >>> Failed to load script unicharset from:C:\Program Files >>>> (x86)\Tesseract-OCR\training/Latin.unicharset >>>> >>> >>> The thing is that the training software adds a "/" instead of a "\". >>> I've looked on unicharset_training_utils.cpp, in the line 166, the "/" >>> is added without taking care if the command is used on Windows or Linux. >>> >>> Is there a solution for Windows to load Latin.unicharset even with this >>> "/" ? >>> If not, what is the easiest solution ? >>> >>> For information, my unicharset2 file looks like that : >>> >>>> 7 >>>> NULL 0 Common 0 >>>> Joined 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 >>>> 6e 65 64 ]a >>>> |Broken|0|1 f 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken >>>> C 5 0,255,0,255,0,0,0,0,0,0 Latin 3 0 3 C # C [43 ]A >>>> 0 8 0,255,0,255,0,0,0,0,0,0 Common 4 2 4 0 # 0 [30 ]0 >>>> ... >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/aa3a131c-51fe-42ea-9fba-336ef89737cd%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/aa3a131c-51fe-42ea-9fba-336ef89737cd%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/51e77998-357a-4bcd-a2f3-daec8eb4315a% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/51e77998-357a-4bcd-a2f3-daec8eb4315a%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWRoZrxORwTCS9GmvWxK-PdBrD75Gz3_hmPnLScNMhDGw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

