Hello
I'm trying to train tesseract for recognition of patterns present in 
tickets. Each ticket possesses a unique pattern in a predetermined place 
which determines its value. As these patterns are not including unicode 
characters,  I assigned them the characters 'a' to 'f'.
I created a .tif image with six patterns:
bil.pat.exp0.tif 
<https://drive.google.com/file/d/0B7CfYFzWHQDAYWU4M3hIQXUyOWs/view?usp=sharing>
and the corresponding file box:
bil.pat.exp0.box 
<https://drive.google.com/file/d/0B7CfYFzWHQDAVkJlZ3lreEdpaXc/view?usp=sharing>
a 32 692 165 958 0 
b 221 734 354 958 0 
c 32 446 165 628 0 
d 221 488 354 628 0 
e 32 275 165 373 0 
f 221 317 277 373 0

Then I ran:
tesseract bil.pat.exp0.tif bil.pat.exp0 box.train
and output:
Tesseract Open Source OCR Engine v3.04.00 with Leptonica 
Page 1 
APPLY_BOXES: 
   Boxes read from boxfile:       6 
APPLY_BOXES: Unlabelled word at :Bounding box=(-958,221)->(-734,277) 
APPLY_BOXES: Unlabelled word at :Bounding box=(-628,221)->(-488,277) 
APPLY_BOXES: Unlabelled word at :Bounding box=(-958,32)->(-734,88) 
APPLY_BOXES: Unlabelled word at :Bounding box=(-628,32)->(-488,88) 
APPLY_BOXES: Unlabelled word at :Bounding box=(-373,32)->(-317,88) 
   Found 6 good blobs. 
   5 remaining unlabelled words deleted. 
Generated training data for 6 words
That can not mean negative coordinates. Despite this I tried to keep going.
My font_properties is:
bil.pat.box 0 0 1 0 0
bil.words_list is:
a 
b 
c 
d 
e 
f 

then I ran:
$ unicharset_extractor bil.pat.exp0.box
Extracting unicharset from bil.pat.exp0.box 
Wrote unicharset file ./unicharset.
but the unicharset file has:
9 
NULL 0 NULL 0 
Joined 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0     # Joined [4a 6f 69 6e 65 64 
] 
|Broken|0|1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0        # Broken 
a 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # a [61 ] 
b 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # b [62 ] 
c 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # c [63 ] 
d 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # d [64 ] 
e 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # e [65 ] 
f 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0  # f [66 ]
Then I ran:
$ mftraining -F font_properties -U unicharset -O bil.unicharset bil.pat.exp0
.tr  
Read shape table shapetable of 0 shapes 
Reading bil.pat.exp0.tr ... 
Bad properties for index 3, char a: 0,255 0,255 0,0 0,0 0,0 
Bad properties for index 4, char b: 0,255 0,255 0,0 0,0 0,0 
Bad properties for index 5, char c: 0,255 0,255 0,0 0,0 0,0 
Bad properties for index 6, char d: 0,255 0,255 0,0 0,0 0,0 
Bad properties for index 7, char e: 0,255 0,255 0,0 0,0 0,0 
Bad properties for index 8, char f: 0,255 0,255 0,0 0,0 0,0 
Warning: no protos/configs for Joined in CreateIntTemplates() 
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates() 
Warning: no protos/configs for a in CreateIntTemplates() 
Warning: no protos/configs for b in CreateIntTemplates() 
Warning: no protos/configs for c in CreateIntTemplates() 
Warning: no protos/configs for d in CreateIntTemplates() 
Warning: no protos/configs for e in CreateIntTemplates() 
Warning: no protos/configs for f in CreateIntTemplates() 
Done!
That's what I'm doing wrong?
I am on debian.
tesseract 3.04.00 
 leptonica-1.72 
  libgif 4.1.6(?) : libjpeg 6b (libjpeg-turbo 1.4.0) : libpng 1.2.50 : 
libtiff 4.0.5 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0
>From already thank you very much!



-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a619104a-79d5-40ec-8a08-a6a9941ec292%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to