You're right... I've been testing out the psm flag in various situations
this whole time, but last night when I was trying out all of your
suggestions, it slipped my mind. The best solution I've found is to segment
the columns into "rows" of 1 or 2 digits each and use the "-psm 7" switch.
So far, it reads everything perfectly.

On a semi-related note, I'm really impressed with Tesseract. In my
preliminary OCR research I read many posts saying that Tesseract's
recognition was fairly poor and that a different/commercial OCR package
should be used. I think these people didn't know about or hadn't use the
training feature of Tesseract, because it's working wonderfully for me,
which is great considering I had almost no expectations coming in :)

Thanks a lot to everyone for the help and to the developers who work on
this tool.

On Tue, Feb 14, 2012 at 1:33 AM, Dmitri Silaev <daemons2...@gmail.com>wrote:

> Did you try the "psm" switch (look for it in the forum)? Your own
> segmentation? Both combined?
>
> Warm regards,
> Dmitri Silaev
> www.CustomOCR.com
>
>
>
> On Tue, Feb 14, 2012 at 1:55 AM, John Williams <jdwilliams1...@gmail.com>
> wrote:
> > If I duplicate the column 9 times, so that there's ten columns with the
> same
> > numbers, it reads it correctly. Running these results through the
> training
> > tools didn't help it recognize the original image, though. Running
> tesseract
> > on images with a single digit yielded nothing as well.
> >
> > In my program, do I have to programatically duplicate my column of
> numbers
> > several times and then figure out what the result was supposed to be...
> or
> > can I train tesseract to recognize a single column? I suppose
> duplicating it
> > will work, but it seems like a bad hack.
> >
> > On Mon, Feb 13, 2012 at 10:42 AM, Chris <cmgreen...@gmail.com> wrote:
> >>
> >> I'd try segmenting the numbers out yourself and feeding them into
> >> tesseract as individual characters. Might work better than feeding it
> >> the whole image.
> >>
> >> Make sure you put some padding around each character.
> >>
> >> On Feb 13, 1:56 am, JD <jdwilliams1...@gmail.com> wrote:
> >> > I'm using v 3.01 on Windows 7 to perform OCR on another program. I
> >> > don't have access to the fonts the program is using, so I trained
> >> > tesseract using some screenshots, and so far the text recognition is
> >> > far better than I expected. However, when I try to process a
> >> > screenshot that contains only a few numbers, it doesn't match anything
> >> > at all. If was matching garbage, or the wrong numbers, then I'd just
> >> > keep working on improving the training... but it doesn't find
> >> > anything. Does anyone have a suggestion about what I should try?
> >> >
> >> > It doesn't look like I can attach a screenshot, but the numbers are in
> >> > a column... something like this:
> >> >
> >> > 10
> >> > 13
> >> > 14
> >> > 15
> >> > 17
> >> >
> >> > I pre-process the screenshots so the text is black on white. I also
> >> > zoom in on the images, so they're slightly blurred (only very
> >> > slightly)... but the text recognition is near perfect, so I don't
> >> > think that's an issue. Plus, it seems like it should find SOMETHING.
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> >> Groups "tesseract-ocr" group.
> >> To post to this group, send email to tesseract-ocr@googlegroups.com
> >> To unsubscribe from this group, send email to
> >> tesseract-ocr+unsubscr...@googlegroups.com
> >> For more options, visit this group at
> >> http://groups.google.com/group/tesseract-ocr?hl=en
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to tesseract-ocr@googlegroups.com
> > To unsubscribe from this group, send email to
> > tesseract-ocr+unsubscr...@googlegroups.com
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to