trouble using the blacklist

2011-03-09 Thread patrickq
Here is the sequence of calls we are using to get the complete information about text in the image: myTess->SetImage(grayScaleImageData, grayScaleWidth, grayScaleHeight, 1, grayScaleWidth); BLOCK_LIST* block_list = myTess->FindLinesCreateBlockList(); PAGE_RES* page_res_pass1 = myTess->RecognitionP

Re: Trouble recognizing characters in images with different character size

2011-03-09 Thread patrickq
This is a known issue with Tesseract. One solution is to process the OCR results then detect the size discrepancy between the two parts of the line and then re-process each part as a separate image. In essence, doing that prevents Tesseract from drawing bad inferences. I think Tesseract 3.01 bring

Trouble recognizing characters in images with different character size

2011-03-09 Thread Søren Engel
Hello fellow members, I am currently working on a upgrading an old OCR module at our development team which was originally written in VB using the integreted OCR components within Microsoft Office 2003. Since this is discontinued (as Microsoft was discarded the COM components from the office distr

Working with FAX images with lines/borders

2011-03-09 Thread dhoffer
I'm using the command line version (if it works I'll use the API) to convert images (I can make any format, jpeg, tiff, etc) that are images of FAXed documents. The text quality varies but I think the bigger problem is that the text/data is inside of a table with lines/borders. When I use tess

How to detect inverted image in a picture

2011-03-09 Thread Ice Head
Hi, I'm using tesseract 3.01 and failed to read simple images like this one (see link below) https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B1-BKwD5hqtZZWE5OGE0MDctOWY5MS00N2UwLWJlNWYtMDQwMTM2NzY4OWE0&hl=fr&authkey=CMPj0sEL Is there a way to read this kind of picture ? -- Y

consult dawg content files

2011-03-09 Thread electronico.nc
Hello everyone, I'm sorry if the question has been already posted and answered, but I haven't found in the posted messages. I have unpacked the provided /usr/local/share/tessdata/fra.traineddata file. I would like to know the words entered in fra.freq-dawg, fra.word- dawg, fra.punc-dawg files ... I