[this might be a repost; the first attept didn't show up]

I'm using the C API of tesseract 4.0 on OS X, and I tried to add some more 
characters.   (4.0 seems much better than 3.x, I should add - thanks to 
everyone who made this possible!)

I used this manual 

as a guide to construct the following 
script:  https://pastebin.com/4n2mRSpq     

Before running, I modified  langdata/eng/eng.training_text with the extra 
chars, maybe 15 instances of each, as instructed.

I'm using only a subset of the original training fonts, but I figure it is 
OK, since I'm adding only a few distinctive chars.   

The NN optimizer lstmtraining ran, and gave a bunch of checkpoints, and a 
final file $train_output_dir/eng/eng.trainedata

But this eng.traineddata was 5MB when the original one was 15.4MB.    And 
when I tried to copy it over the pre-loaded 'best' eng.traineddata and run 
tesseract it failed in TessBaseAPIinit3 with error=-1.

Does anyone know why 1) my eng.trainedata is so much smaller and 2) why it 
fails to even load in API init()?

Thanks for any tips!

You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
For more options, visit https://groups.google.com/d/optout.

Reply via email to