Re: Scoreboard digits fail to recognize

2011-10-19 Thread Allen Cook
Oops that one is corrupted, let me try the jpg Allen Cook On Wed, Oct 19, 2011 at 3:35 PM, Allen Cook wrote: > Can anybody give me pointers about how to best improve Tesseract's > accuracy in this instance? I've attached my current input image to > Tesseract. This image fails every time, but fre

Scoreboard digits fail to recognize

2011-10-19 Thread Allen Cook
Can anybody give me pointers about how to best improve Tesseract's accuracy in this instance? I've attached my current input image to Tesseract. This image fails every time, but frequently gives me back different garbage text. I've also tried training it with a similar font but that never gives me

Re: setting up environment to develop app using Tesseract android tools?

2011-10-19 Thread Gautam
Hi, I was trying to set up the same environment on my Mac. I got some errors: Gautam-Mac:tesseract-android-tools Gautam$ ndk-build Install: libjpeg.so => libs/armeabi/libjpeg.so Compile++ thumb : lept <= box.cpp In file included from //Users/Gautam/Documents/workspace/tesseract- android-

Improving OCR for large batches of documents

2011-10-19 Thread Impix
Hey all, I use Tesseract to automatically OCR batches of TIFF files, but the accuracy is pretty much hit or miss. I've been using ImageMagick to convert from PDF to TIFF, and something like "convert -density 380" will produce great OCR results for one file, whereas the same will not work well for

Re: Pictures with numbers

2011-10-19 Thread Dmitri Silaev
Yes, this should certainly be better if you send us one or more example images. Adding to what have already been said, one thing can be noted for sure: Tesseract tries to treat everything as a known character, even schematics or line art. These formations usually appear as garbage in the output. To

Re: Pictures with numbers

2011-10-19 Thread Sven Pedersen
Hi Joao, You probably need to pre-process the images with other software. By itself, Tesseract generally cannot ignore data. You can look through the list archives for some examples. If you provide an example image (or part of one) it might help us to make a suggestion. --Sven On Wed, Oct 19, 201

Re: Pictures with numbers

2011-10-19 Thread patrickq
Normally it's Dmitri's role to say it but I'll do it this time: attach an image to your question so that we know what you are talking about! Without it nobody can help. On Oct 19, 11:42 am, Joao Henriques wrote: > Hello everybody, > > I hope that someone can help me out here. > There was nothing

Pictures with numbers

2011-10-19 Thread Joao Henriques
Hello everybody, I hope that someone can help me out here. There was nothing on the net regarding it, so'll just try it here :) I have a picture that needs to be OCR'ed. The picture contains a schematic and some numbers. I need to retrieve only these numbers from the picture. Tesseract keeps try

Re: Specifying a Threshold for Distance between letters and words.

2011-10-19 Thread merve t
thanks for reply, then i am going to try something else. 2011/10/18 patrickq > Unfortunately as far as I know there is no magic setting to tell > Tesseract to "get spaces right". Ray Smith once wrote that the space > estimation in Tesseract needs a total rewrite. Within ScanBizCards we > questi

Re: quality of a word ?

2011-10-19 Thread Yunus Emre Cavusoglu
i do not expect Tesseract for 100% , i want to know what makes the 5% difference in tesseract can you explain ? On Tue, Oct 18, 2011 at 7:41 PM, patrickq wrote: > If you are referring to the confidence level values returned by > Tesseract these are expressed as "costs" which means a higher values

Re: Tesseract crashes while converting image on win7 machine

2011-10-19 Thread Kristaps
It look like there are problems with either reading tif image with tesseract or in image conversion process with ImageMagick. The image is generated from pdf file and is cropped. I changed from tif to png format and it looks like the problem is solved. On Oct 18, 7:45 pm, zdenko podobny wrote: >