Tesseract OCR font style (bold, italic)...

2012-07-30 Thread Nada Feteha
I am trying using tesseract OCR 3.01 and visual studio 2010 to extract bold and italic words from an image. For example, if I input a clear image with text like so: "The quick *brown* fox *jumps* over the *lazy* dog." I would like to get an output just like so:" The quick brown fox jumps over

Re: hOCR Character Encoding Problem

2012-07-30 Thread Sven Pedersen
Hi Cian, It would be best to use the pre-release version 3.02 (from SVN) if possible, since a couple of important fixes have been made to the hOCR support. --Sven On Mon, Jul 30, 2012 at 8:13 AM, Cian Mc Govern wrote: > Hi all, > > I'm using Tesseract with the hOCR output format. When I invoke Te

Re: hOCR Character Encoding Problem

2012-07-30 Thread zdenko podobny
On Mon, Jul 30, 2012 at 3:13 PM, Cian Mc Govern wrote: > Hi all, > > I'm using Tesseract with the hOCR output format. When I invoke Tesseract > on an image, the results are returned in hOCR format with a UTF-8 character > encoding. However, if I then convert the same image to TIFF format from > PN

Re: Help to link libs on VC++

2012-07-30 Thread Ankur Rana
hi Angelica... this header file will be used to compile tesserect in linux. I compiled it on vs2010 but its done without problem. On Sun, Jul 29, 2012 at 7:22 PM, Angélica Mascaro wrote: > Hi, Ankur, > Ok, i've seet the paths of includes (just now I noticed the > "tesseract/api" project was a .

hOCR Character Encoding Problem

2012-07-30 Thread Cian Mc Govern
Hi all, I'm using Tesseract with the hOCR output format. When I invoke Tesseract on an image, the results are returned in hOCR format with a UTF-8 character encoding. However, if I then convert the same image to TIFF format from PNG/JPEG etc. the character encoding of the hOCR output is Latin1

Re: Help to link libs on VC++

2012-07-30 Thread TP
On Sat, Jul 28, 2012 at 8:54 PM, Angélica Mascaro wrote: > On project properties -> Linker -> Additional Dependencies i`ve put all the > libs that tesseract generated (ccmain.lib ccstruct.lib ccutil.lib > classify.lib cube.lib cutil.lib dict.lib image.lib libtesseract_tessopt.lib > libtesseract_tr

Re: Problem With Combining Font Data

2012-07-30 Thread shah dipen
first you need to follow steps at http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 for your language and fonts just go through it, they seems hard at first but they are really easy. On Monday, 30 July 2012 12:47:49 UTC+5:30, Txoov Chij Her wrote: > > if i have a new language and n

Re: Problem With Combining Font Data

2012-07-30 Thread Txoov Chij Her
if i have a new language and new font How is covert this font for support OCR program. please tell me too. On Monday, July 30, 2012 1:35:49 PM UTC+7, shah dipen wrote: > Hello everyone, > I was able to complete OCR on single file, but when i > used more than 32 font fil

Problem With Combining Font Data

2012-07-30 Thread shah dipen
Hello everyone, I was able to complete OCR on single file, but when i used more than 32 font files and combined generated data the new eng.traineddata file was not able to perform even 1% more better than original trained data file. I have performed all steps on Tesseract-3