Re: Getting access to the deskew angle used to rotate the original image to create the image to be OCRed

2012-10-09 Thread Kent Fitch
Thanks - I havent ventured into the code yet, but maybe it is time! On Wednesday, October 10, 2012 11:37:47 AM UTC+11, Kage.Sabaku.No.Gaara wrote: > > I suppose you could modify tesseract to "remember" such things and pass > them in the arguments. Simple stuff actually. > > On Fri, Sep 28, 2012

Re: use tesseract api in visual c++ 2010

2012-10-09 Thread Gaara Sabaku
wrong again, it is the latest version 3.02, and its ok if you don't understand. Its also ok if you never plan to make massive changes to the library because you don't (obviously) know what your doing. On Tue, Sep 18, 2012 at 10:23 PM, TP wrote: > On Tue, Sep 18, 2012 at 4:58 PM, Kage.Sabaku.No.G

Re: Tess v3 not recognising accented Esperanto characters.

2012-10-09 Thread Donaldo
I found that someone in tesseract-ocr group recommended using a config parameter to switch on a new method:enable_new_segsearch 1so I created a new epo.config file (there wasn't one before) with that one line in it. I generated a new epo.traineddata file, and reran my test*combine_tessda

Re: Tesseract-OCR training problems

2012-10-09 Thread Gaara Sabaku
this is what I typed exactly last time and have created many custom fonts. mftraining -F font_properties -U unicharset -O eng.unicharset eng.verdana.box.tr notice the grievous differences On Sun, Oct 7, 2012 at 5:07 AM, zdenko podobny wrote: > >1. On you screenshot there there is other comma

Re: BBox is larger than the actual word

2012-10-09 Thread Gaara Sabaku
That is not normal behavior. I have a suspicion of your issue. Which box editor are you using if any. On Tue, Oct 9, 2012 at 11:26 AM, zdenko podobny wrote: > Please provide input image and tesseract hocr output. > > -- > Zdenko > > > On Mon, Oct 8, 2012 at 4:20 PM, Attila Somogyi wrote: > >> >>

Re: Andriod platform differences and windows?

2012-10-09 Thread Gaara Sabaku
you may contact me privatly regarding this matter. I can help you. On Tue, Oct 2, 2012 at 11:28 AM, Sven Pedersen wrote: > It seems you did not search the archives: > > https://groups.google.com/forum/?fromgroups#!searchin/tesseract-ocr/license$20plate > > You will need to contact the people who

Re: Effect of font_properties

2012-10-09 Thread Gaara Sabaku
agreed, tesseract was not debugged or developed in the manner you speak. Tiff( leptonica library ) is the image structure chosen for many reasons, one of which is its multi page format. The better you understand leptonica the better your usage of tesseract will be. when tesseract trains is calls

Re: Getting access to the deskew angle used to rotate the original image to create the image to be OCRed

2012-10-09 Thread Gaara Sabaku
I suppose you could modify tesseract to "remember" such things and pass them in the arguments. Simple stuff actually. On Fri, Sep 28, 2012 at 9:53 PM, Kent Fitch wrote: > Hi, > > I've just started trying tesseract and finding the results very > impressive. > > I'm trying to match the OCRed text

Re: BBox is larger than the actual word

2012-10-09 Thread Attila Somogyi
I've attached the files. Here's the html content: > "http://www.w3.org/TR/html4/loose.dtd";> >> >> >> >> >> >> >> >> >> >> >> > class='ocr_word' id='word_1_1' title="bbox 107 81 262 136">> class='ocrx_word' id='xword_1_1' title="x_wconf -1">apple, >> > class='ocrx_word' id='xword_1_2'

Re: use tesseract api in visual c++ 2010

2012-10-09 Thread TP
On Tue, Oct 9, 2012 at 10:19 AM, zdenko podobny wrote: > This is page in svn repository (e.g. it waits for 3.02 release). 3.02 > version was not released yet, but you can get it from svn[1]. > > [1] https://code.google.com/p/tesseract-ocr/wiki/TesseractSvnInstallation > Windows users should see [

Re: BBox is larger than the actual word

2012-10-09 Thread zdenko podobny
Please provide input image and tesseract hocr output. -- Zdenko On Mon, Oct 8, 2012 at 4:20 PM, Attila Somogyi wrote: > > > Hi! > > Im using 3.01. I use the html file to get the box informations(

Re: use tesseract api in visual c++ 2010

2012-10-09 Thread zdenko podobny
This is page in svn repository (e.g. it waits for 3.02 release). 3.02 version was not released yet, but you can get it from svn[1]. [1] https://code.google.com/p/tesseract-ocr/wiki/TesseractSvnInstallation -- Zdenko Dňa 9.10.2012 8:03, "JVIyer" napísal(-a): > > http://tesseract-ocr.googlecode.

Re: recognizing costum text

2012-10-09 Thread commerci...@gmx.net
Am Freitag, 5. Oktober 2012 18:33:30 UTC+2 schrieb Francisco Loché Costa: > > Looking at the first image you have attached, i think you may need to > eliminate that balck outline who surrounds the blue in the characters (by > eliminate i mean turn it into the same colour as the background). If