Hi Juan Pablo,

The problem seems interesting. However not sure if you can use Tesseract
for that. Could you show one or more example tickets?

Best regards,
Dmitri Silaev
www.CustomOCR.com





On Tue, Sep 22, 2015 at 2:17 AM, Juan Pablo Aveggio <jpaveg...@gmail.com>
wrote:

> Hello
> I'm trying to train tesseract for recognition of patterns present in
> tickets. Each ticket possesses a unique pattern in a predetermined place
> which determines its value. As these patterns are not including unicode
> characters,  I assigned them the characters 'a' to 'f'.
> I created a .tif image with six patterns:
> bil.pat.exp0.tif
> <https://drive.google.com/file/d/0B7CfYFzWHQDAYWU4M3hIQXUyOWs/view?usp=sharing>
> and the corresponding file box:
> bil.pat.exp0.box
> <https://drive.google.com/file/d/0B7CfYFzWHQDAVkJlZ3lreEdpaXc/view?usp=sharing>
> a 32 692 165 958 0
> b 221 734 354 958 0
> c 32 446 165 628 0
> d 221 488 354 628 0
> e 32 275 165 373 0
> f 221 317 277 373 0
>
> Then I ran:
> tesseract bil.pat.exp0.tif bil.pat.exp0 box.train
> and output:
> Tesseract Open Source OCR Engine v3.04.00 with Leptonica
> Page 1
> APPLY_BOXES:
>    Boxes read from boxfile:       6
> APPLY_BOXES: Unlabelled word at :Bounding box=(-958,221)->(-734,277)
> APPLY_BOXES: Unlabelled word at :Bounding box=(-628,221)->(-488,277)
> APPLY_BOXES: Unlabelled word at :Bounding box=(-958,32)->(-734,88)
> APPLY_BOXES: Unlabelled word at :Bounding box=(-628,32)->(-488,88)
> APPLY_BOXES: Unlabelled word at :Bounding box=(-373,32)->(-317,88)
>    Found 6 good blobs.
>    5 remaining unlabelled words deleted.
> Generated training data for 6 words
> That can not mean negative coordinates. Despite this I tried to keep going.
> My font_properties is:
> bil.pat.box 0 0 1 0 0
> bil.words_list is:
> a
> b
> c
> d
> e
> f
>
> then I ran:
> $ unicharset_extractor bil.pat.exp0.box
> Extracting unicharset from bil.pat.exp0.box
> Wrote unicharset file ./unicharset.
> but the unicharset file has:
> 9
> NULL 0 NULL 0
> Joined 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # Joined [4a 6f 69 6e 65
> 64 ]
> |Broken|0|1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0        # Broken
> a 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # a [61 ]
> b 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # b [62 ]
> c 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # c [63 ]
> d 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # d [64 ]
> e 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # e [65 ]
> f 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # f [66 ]
> Then I ran:
> $ mftraining -F font_properties -U unicharset -O bil.unicharset bil.pat.
> exp0.tr
> Read shape table shapetable of 0 shapes
> Reading bil.pat.exp0.tr ...
> Bad properties for index 3, char a: 0,255 0,255 0,0 0,0 0,0
> Bad properties for index 4, char b: 0,255 0,255 0,0 0,0 0,0
> Bad properties for index 5, char c: 0,255 0,255 0,0 0,0 0,0
> Bad properties for index 6, char d: 0,255 0,255 0,0 0,0 0,0
> Bad properties for index 7, char e: 0,255 0,255 0,0 0,0 0,0
> Bad properties for index 8, char f: 0,255 0,255 0,0 0,0 0,0
> Warning: no protos/configs for Joined in CreateIntTemplates()
> Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
> Warning: no protos/configs for a in CreateIntTemplates()
> Warning: no protos/configs for b in CreateIntTemplates()
> Warning: no protos/configs for c in CreateIntTemplates()
> Warning: no protos/configs for d in CreateIntTemplates()
> Warning: no protos/configs for e in CreateIntTemplates()
> Warning: no protos/configs for f in CreateIntTemplates()
> Done!
> That's what I'm doing wrong?
> I am on debian.
> tesseract 3.04.00
>  leptonica-1.72
>   libgif 4.1.6(?) : libjpeg 6b (libjpeg-turbo 1.4.0) : libpng 1.2.50 :
> libtiff 4.0.5 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0
> From already thank you very much!
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/a619104a-79d5-40ec-8a08-a6a9941ec292%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/a619104a-79d5-40ec-8a08-a6a9941ec292%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAKzLxFOdKoyreS5fYjaO7HJqSc8k7GkTXjjPkNAzhQ7sAO6BGA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to