You're right... I've been testing out the psm flag in various situations this whole time, but last night when I was trying out all of your suggestions, it slipped my mind. The best solution I've found is to segment the columns into "rows" of 1 or 2 digits each and use the "-psm 7" switch. So far, it reads everything perfectly.
On a semi-related note, I'm really impressed with Tesseract. In my preliminary OCR research I read many posts saying that Tesseract's recognition was fairly poor and that a different/commercial OCR package should be used. I think these people didn't know about or hadn't use the training feature of Tesseract, because it's working wonderfully for me, which is great considering I had almost no expectations coming in :) Thanks a lot to everyone for the help and to the developers who work on this tool. On Tue, Feb 14, 2012 at 1:33 AM, Dmitri Silaev <daemons2...@gmail.com>wrote: > Did you try the "psm" switch (look for it in the forum)? Your own > segmentation? Both combined? > > Warm regards, > Dmitri Silaev > www.CustomOCR.com > > > > On Tue, Feb 14, 2012 at 1:55 AM, John Williams <jdwilliams1...@gmail.com> > wrote: > > If I duplicate the column 9 times, so that there's ten columns with the > same > > numbers, it reads it correctly. Running these results through the > training > > tools didn't help it recognize the original image, though. Running > tesseract > > on images with a single digit yielded nothing as well. > > > > In my program, do I have to programatically duplicate my column of > numbers > > several times and then figure out what the result was supposed to be... > or > > can I train tesseract to recognize a single column? I suppose > duplicating it > > will work, but it seems like a bad hack. > > > > On Mon, Feb 13, 2012 at 10:42 AM, Chris <cmgreen...@gmail.com> wrote: > >> > >> I'd try segmenting the numbers out yourself and feeding them into > >> tesseract as individual characters. Might work better than feeding it > >> the whole image. > >> > >> Make sure you put some padding around each character. > >> > >> On Feb 13, 1:56 am, JD <jdwilliams1...@gmail.com> wrote: > >> > I'm using v 3.01 on Windows 7 to perform OCR on another program. I > >> > don't have access to the fonts the program is using, so I trained > >> > tesseract using some screenshots, and so far the text recognition is > >> > far better than I expected. However, when I try to process a > >> > screenshot that contains only a few numbers, it doesn't match anything > >> > at all. If was matching garbage, or the wrong numbers, then I'd just > >> > keep working on improving the training... but it doesn't find > >> > anything. Does anyone have a suggestion about what I should try? > >> > > >> > It doesn't look like I can attach a screenshot, but the numbers are in > >> > a column... something like this: > >> > > >> > 10 > >> > 13 > >> > 14 > >> > 15 > >> > 17 > >> > > >> > I pre-process the screenshots so the text is black on white. I also > >> > zoom in on the images, so they're slightly blurred (only very > >> > slightly)... but the text recognition is near perfect, so I don't > >> > think that's an issue. Plus, it seems like it should find SOMETHING. > >> > >> -- > >> You received this message because you are subscribed to the Google > >> Groups "tesseract-ocr" group. > >> To post to this group, send email to tesseract-ocr@googlegroups.com > >> To unsubscribe from this group, send email to > >> tesseract-ocr+unsubscr...@googlegroups.com > >> For more options, visit this group at > >> http://groups.google.com/group/tesseract-ocr?hl=en > > > > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to tesseract-ocr@googlegroups.com > > To unsubscribe from this group, send email to > > tesseract-ocr+unsubscr...@googlegroups.com > > For more options, visit this group at > > http://groups.google.com/group/tesseract-ocr?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en