Re: Recognize only digits

2011-10-14 Thread Bojan Petkovski
Thanks for the tips. It seems that the biggest problem I face is font size. I have images with text with different sizes and it seems that if the size is good tesseract gives awesome results and if it is not it is some stupid data. Is there any work around to this problem? -- You received th

Re: Recognize only digits

2011-10-13 Thread Sandeep Parmar
you can whitelist ur required characters i.e numbers or alphabets by doing dis in tesseractmain.cpp *api.SetVariable("tessedit_char_whitelist", "0123456789. ");* * * *then tesseract wil only choose numbers to perform recognition.* * * *Regards* *Sandeep * On Thu, Oct 13, 2011 at 10:28 PM, Sven Ped

Re: Recognize only digits

2011-10-13 Thread Sven Pedersen
Yes, b&w tiff should be much better than JPEG. And you might be better off using an existing trained English or other language (trained for both letters and numbers) to recognize, then use regular expressions to find numbers. Image correction with ImageMagick or the like can help to improve results

Re: Recognize only digits

2011-10-13 Thread Bojan Petkovski
Yes but the problem is I don't get correct results from images. My idea was to train tesseract so to know what to look for. It is giving very poor results. Is there any optimization, like black&white images, maybe tiff format better then jpeg or simular so I can get better match rate? Thanks -

Re: Recognize only digits

2011-10-13 Thread Sven Pedersen
You could post process with regular expressions. Use Perl or Python maybe. Sven On Thursday, October 13, 2011, Bojan Petkovski wrote: > Hi, > I trained tessarect for number only and trained it on ranges 1990, 1991, 1992, ... , 2010, 2011, 2012. I put only does words as most frequent word and othe

Recognize only digits

2011-10-13 Thread Bojan Petkovski
Hi, I trained tessarect for number only and trained it on ranges 1990, 1991, 1992, ... , 2010, 2011, 2012. I put only does words as most frequent word and other words. I started testing and what I get is characters translated in some number. I need tessarect to recognize only the numbers on the