Did you ever look at incorporating the unicharambigs file into your
training?

http://www.resolveradiologic.com/blog/2013/01/16/more-on-training-tesseract/

On 26 June 2016 at 15:09, Timothy Korse <timothy.ko...@gmail.com> wrote:

> I'm trying to configurate tesseract to recognize *alphanumeric strings* of
> 10 characters long (all uppercase).
>
>
> This works pretty good, except it seems to mix up the following characters
> pretty often:
>
>    - 2 and Z
>    - 6 and G
>
>
> Examples of images are:
>
>
> <https://lh3.googleusercontent.com/-20dr7dBmT9c/V2_eMKE7TtI/AAAAAAAAAKw/ENcZMZogPws1elcz7BV0WRsE4B8M22IWgCKgB/s1600/X2JR6XK6VGMQP2L5.jpg>
>
>
> <https://lh3.googleusercontent.com/-MysZA6TlqI0/V2_eQyVCOzI/AAAAAAAAAKw/LgUKmhGzsvcfod1bHLEIRfBtKO7-dCodQCKgB/s1600/X2LHV6KHPJ5TFTDK.jpg>
>
>
> <https://lh3.googleusercontent.com/-s6QuiuY_GK8/V2_eUtSCvBI/AAAAAAAAAKw/nM-vnz9SCvQ2OWPuwytKJirJMCS4kIGqgCKgB/s1600/X3K9V5XKQV3Z5QT5.jpg>
>
>
> <https://lh3.googleusercontent.com/-QVLjGd9Lcik/V2_eYvEDsJI/AAAAAAAAAKw/c_s5sYdtE0AbFZX8OqNiEAAvrnooYD6pwCKgB/s1600/X3P92TR7Q93F2G9F.jpg>
>
>
> <https://lh3.googleusercontent.com/-wfH5bpBqC5E/V2_egk0Sj3I/AAAAAAAAAKw/-da1JPAT_hUF5CEn6c9FkkZqANu3TDtngCKgB/s1600/X4NT7CFMH2GR7HXZ.jpg>
>
>
> <https://lh3.googleusercontent.com/-KHssFqw1XyE/V2_emEmR4yI/AAAAAAAAAK0/kftsbb0E65os-rdIlkHxpqT8Ip7gkWWbwCKgB/s1600/X4QGN9XQ3KP69YZX.jpg>
>
> These are preprocessed. I think this process was successfully done. I'll
> glad to hear otherwise.
>
>
> This is how I run Tesseract:
>
>
> tesseract = new Tesseract();
> tesseract.setOcrEngineMode(TessAPI.TessOcrEngineMode.OEM_TESSERACT_ONLY);
> tesseract.setPageSegMode(7);
> tesseract.setTessVariable("load_system_dawg", "0");
> tesseract.setTessVariable("load_freq_dawg", "0");
> tesseract.setTessVariable("load_punc_dawg", "0");
> tesseract.setTessVariable("load_number_dawg", "0");
> tesseract.setTessVariable("load_unambig_dawg", "0");
> tesseract.setTessVariable("load_bigram_dawg", "0");
> tesseract.setTessVariable("load_fixed_length_dawgs", "0");
>
> tesseract.setTessVariable("classify_enable_learning", "0");
> tesseract.setTessVariable("classify_enable_adaptive_matcher", "0");
>
> tesseract.setTessVariable("segment_penalty_garbage", "0");
> tesseract.setTessVariable("segment_penalty_dict_nonword", "0");
> tesseract.setTessVariable("segment_penalty_dict_frequent_word", "0");
> tesseract.setTessVariable("segment_penalty_dict_case_ok", "0");
> tesseract.setTessVariable("segment_penalty_dict_case_bad", "0");
>
>
> *Note that this is Java code, but my question is not limited to Java.*
>
> I am not really experienced with Tesseract and seem to find the
> documentation very unclear. I hope someone else can help me out.
> ------------------------------
>
> To give some more context:
>
>
> *How do I train Tesseract?*
>
>
> I train Tesseract by combining over 200 images into one image. Every image
> contains 10 alphanumeric characters. Also, I am sure the box file is
> correct.
>
>
> I build the final language by executing the following batch script:
>
> tesseract qwe.combined.jpg qwe.combined.box nobatch box.train
>
> echo combined 1 0 0 0 0 > font_properties
>
> unicharset_extractor qwe.combined.box
>
> shapeclustering -F font_properties -U unicharset qwe.combined.box.tr
>
> mftraining -F font_properties -U unicharset -O qwe.unicharset 
> qwe.combined.box.tr
>
> cntraining qwe.combined.box.tr
>
> copy inttemp qwe.inttemp
> copy normproto qwe.normproto
> copy pffmtable qwe.pffmtable
> copy shapetable qwe.shapetable
>
> combine_tessdata qwe.
>
> ------------------------------
>
> How can I make Tesseract discriminate better between the 2, Z, 6 and G?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/bba1f122-6bb2-43f6-9a7d-9daa75f5323e%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/bba1f122-6bb2-43f6-9a7d-9daa75f5323e%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAORW5vgDrVMzOok0ZA2xzB4-K6mibvSycp%3D_G5Y_TpH2-4YSGg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to