Thanks for the tips.
It seems that the biggest problem I face is font size. I have images with
text with different sizes and it seems that if the size is good tesseract
gives awesome results and if it is not it is some stupid data. Is there any
work around to this problem?
--
You received th
you can whitelist ur required characters i.e numbers or alphabets by doing
dis in tesseractmain.cpp
*api.SetVariable("tessedit_char_whitelist", "0123456789. ");*
*
*
*then tesseract wil only choose numbers to perform recognition.*
*
*
*Regards*
*Sandeep
*
On Thu, Oct 13, 2011 at 10:28 PM, Sven Ped
Yes, b&w tiff should be much better than JPEG. And you might be better
off using an existing trained English or other language (trained for
both letters and numbers) to recognize, then use regular expressions
to find numbers. Image correction with ImageMagick or the like can
help to improve results
Yes but the problem is I don't get correct results from images. My idea was
to train tesseract so to know what to look for. It is giving very poor
results. Is there any optimization, like black&white images, maybe tiff
format better then jpeg or simular so I can get better match rate?
Thanks
-
You could post process with regular expressions. Use Perl or Python maybe.
Sven
On Thursday, October 13, 2011, Bojan Petkovski wrote:
> Hi,
> I trained tessarect for number only and trained it on ranges 1990, 1991,
1992, ... , 2010, 2011, 2012. I put only does words as most frequent word
and othe
Hi,
I trained tessarect for number only and trained it on ranges 1990, 1991,
1992, ... , 2010, 2011, 2012. I put only does words as most frequent word
and other words. I started testing and what I get is characters translated
in some number. I need tessarect to recognize only the numbers on the
6 matches
Mail list logo