Needless to say this is a difficult image. For a start the angle at which
the picture is taken is skewed, the plastic is squished on the right. There
is god knows how much other text noise in and around the image, and then
there's just natural scene noise - edges, shading, lines etc. Tesseract
does not like this kind of image.

You have to whittle your input to Tesseract down to as clean an image as
possible. I have tried cropping your image right back to the white of the
areas you suggest and got at best:

0:669 S$29 i 1535 10.0991

This probably better than you got but not accurate enough - I think you
need to think hard about how best to extract the zone you are after first.
This design is fairly common in UK food so perhaps you can somehow
recognise this part of the input image and then crop it out, then do a
further crop using line detection to get the individual pieces out.

Having said that, even a well-spaced cropped 1st element:

[image: Inline images 1]


Is returning 0.69 - the 'g' coming out as a 9 - you might fix this with
training on this font however as the height of the lower case g is
unusually high.

Cheers

On 28 July 2016 at 18:16, Douglas Millward <djm...@gmail.com> wrote:

> Hi
> I'm new to this forum and I've searched for a similar topic - excuse me if
> i've missed anything relevant.
> I want to OCR the 'traffic light' nutrition information on food packaging
> - its basically numbers with a small g - an example is attached. I have
> processed it through tesseract and I just get gobbledegook. Do I need to
> train it to read this format? And if so (heres hoping) has anyone done
> anything similar?
> Any pointers welcome
>
> kind regards
>
> Doug
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/4d16df2f-cf10-4550-bf89-6c568805ab4a%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/4d16df2f-cf10-4550-bf89-6c568805ab4a%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAORW5vjJ6JF1ZZ84c-VjinWnjhpsRnJLALKGLPVc%2BRXZOjnegw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to