can you please share the converted radical-stroke.txt file? On Tuesday, August 14, 2018 at 3:12:43 PM UTC+6, zwwts...@gmail.com wrote: > > I'v come across with the same fault before > Because I simply move langdata that clone on window to linux server. > As a consequence, the radical-stroke.txt file which need to be formed on > "CL" turn to be "CR LF" > everything went right after I convert this file > > 在 2018年8月6日星期一 UTC+8下午12:11:33,Shandigutt写道: >> >> Hi, >> >> I am trying to train Tesseract for Sinhala language. I was following >> training >> guidelines >> <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#creating-starter-traineddata> >> >> mentioned in Github wiki. I get an error with reference to the 4th step >> which is "Creating Starter Traineddata". Please find the below command I >> executed, >> >> training/combine_lang_model --input_unicharset >> ../training/sin/sin.unicharset --script_dir ../langdata --words >> ../langdata/sin/sin.wordlist --puncs ../langdata/sin/sin.punc --numbers >> ../langdata/sin/sin.numbers --output_dir ../training/combined_sin >> --version_str 1.0 --lang sin >> >> I get the following output, >> >> Loaded unicharset of size 94 from file ../training/sin/sin.unicharset >> Setting unichar properties >> Setting script properties >> Warning: properties incomplete for index 4 = ී >> Warning: properties incomplete for index 6 = ි >> Warning: properties incomplete for index 11 = ු >> Warning: properties incomplete for index 15 = ් >> Warning: properties incomplete for index 33 = ූ >> Warning: properties incomplete for index 52 = ්ර >> Warning: properties incomplete for index 56 = ්ය >> Warning: properties incomplete for index 87 = ක් >> Warning: properties incomplete for index 93 = ර් >> Config file is optional, continuing... >> Null char=2 >> Invalid format in radical table at line 4: 3400 1.4 >> Creation of encoded unicharset failed!! >> Error writing recoder!! >> Reducing Trie to SquishedDawg >> Reducing Trie to SquishedDawg >> Reducing Trie to SquishedDawg >> >> For more information I have attached my sin.unicharset file and >> sin.config files. >> >> I use below Tesseract version, >> >> tesseract -v >> tesseract 4.00.00dev-696-geba0ae3 >> leptonica-1.74.4 >> libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib >> 1.2.8 >> >> Found SSE >> >> I use below OS, >> >> uname -a >> Linux shandigutt-laptop-ubuntu 4.4.0-130-generic #156-Ubuntu SMP Thu Jun >> 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux >> >> Appreciate if somebody can please help me on this. >> >> Thannks >> >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/12b25dae-2951-4d8d-a317-b6648e4757b6o%40googlegroups.com.