Running via ports can cause diverse errors. Try to compile Tesseract
natively. I use revision 549 and as I said it works fine.

Such tables as you have present a challenge for simple layout
processing algorithms, due to sparsely located text. A minimal skew
which is almost inevitable could break all the logic. In such cases I
prefer to devise a custom made segmentation logic specific to the
document type being processed. In this way I do not depend on
Tesseract's segmentation - Tesseract is being used as a raw
classifier.

Warm regards,
Dmitry Silaev





On Sun, Mar 13, 2011 at 4:47 PM, manuel...@gmail.com
<manuel...@gmail.com> wrote:
> I'm using the latest version tesseract @3.00_2+eng
> I installed using ports in MacOSX
>
> Another question Dmitry about this sample
> In this sample why doesn't tesseract recognize a complete row? It's not a 
> perfect align, but it is impossible to get a image 100% aligned.
> Tesseract is breaking columns in new lines like :
>
> 00001           test    productA
> 00002           test2
> productB
>
> Do you know how to fix it?
>
> Regard
> Manuel Pardo
>
>
> Em 13/03/2011, às 08:32, Dmitry Silaev escreveu:
>
>> Manuel,
>>
>> The sample you provided definitely has insufficient resolution. You
>> may only expect some part of the heading to be recognized. So this is
>> what happened when I've run the recognition of your image. But I
>> haven't got any error or warning messages with my "por.traineddata" at
>> all!
>>
>> However all this was tested under Windows. Probably I can try this
>> under Ubuntu, but I don't know when I have enough time to reboot, set
>> up a C++ compiler, build Tesseract and do some testing, sorry ))
>>
>> Are you sure you downloaded the latest stable version of Tesseract?
>>
>> Warm regards,
>> Dmitry Silaev
>>
>>
>>
>>
>>
>> On Thu, Mar 10, 2011 at 9:32 PM, manuel...@gmail.com
>> <manuel...@gmail.com> wrote:
>>> I just replaced por.traineddata with your file por.traineddata.
>>> After that I'm getting this message error:
>>>
>>>>> manuel$ tesseract input.tiff output -l por
>>>>> actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert 
>>>>> failed:in file tessdatamanager.cpp, line 55
>>>>> Segmentation fault
>>>
>>> I haven't succeeded. I'm using version 3 - MacOSX 10.6
>>>
>>>
>>>
>>> Attached Reported.tiff
>>>
>>>
>>>
>>>
>>>
>>>
>>> Regards
>>> Manuel Pardo
>>>
>>> Em 04/03/2011, às 03:19, Dmitry Silaev escreveu:
>>>
>>>> Manuel,
>>>>
>>>> Is the error message generated by version 2.xx? Did you try to run
>>>> version 3.xx with my "por.traineddata" file?
>>>> I don't get it - have you succeeded or not?
>>>> Please provide us with the image you are trying to recognize.
>>>>
>>>> Warm regards,
>>>> Dmitry Silaev
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com <manuel...@gmail.com> 
>>>> wrote:
>>>>> Hi Dmitry,
>>>>>
>>>>> I just replaced with your file por.traineddata
>>>>> But I'm getting an error:
>>>>>
>>>>> manuel$ tesseract input.tiff output -l por
>>>>> actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert 
>>>>> failed:in file tessdatamanager.cpp, line 55
>>>>> Segmentation fault
>>>>>
>>>>> It's seem to be interesting to convert old files from 2.0X to 3, because 
>>>>> there isn't a brazillian portuguese for version 3,  just "portuguese".
>>>>> At least the dictionary por.traineeddata is working correctly in version 
>>>>> 3.
>>>>> The special chars is being recognized by tesseract 3.
>>>>>
>>>>> regards,
>>>>> Manuel Pardo
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Em 03/03/2011, às 09:12, Dmitry Silaev escreveu:
>>>>>
>>>>>> Manuel,
>>>>>>
>>>>>> It's quite an interesting question although it may seem to be an
>>>>>> ordinary newbie-like one.
>>>>>>
>>>>>> I was always wondering if 2.xx files can be used with version 3.xx.
>>>>>> The wiki states that "the files in the traineddata file are different
>>>>>> from the list used prior to 3.00, and will most likely change,
>>>>>> possibly dramatically in future revisions."
>>>>>>
>>>>>> I have no time to investigate it in the code so I decided to act
>>>>>> rather than to think. After some tinkering with all those files I
>>>>>> slipped the resulted "por.traineddata" into my Tesseract algo I'm
>>>>>> currently working at, and - guess what? - it worked! ))
>>>>>>
>>>>>> I must say it was tested only with a couple of *very simple* images
>>>>>> and also it absolutely lacks any dictionary-related data. And my test
>>>>>> images don't contain these specific Portuguese letters with
>>>>>> diacritics. So in fact this file may perform poorly. Please test and
>>>>>> report your results. The file is in the attachment.
>>>>>>
>>>>>> It was not difficult at all but also not so straight-forward to make
>>>>>> this training data file, so probably this process deserves a separate
>>>>>> article and later I'd like to post it in my blog.
>>>>>>
>>>>>> Warm regards,
>>>>>> Dmitry Silaev
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp <manuel...@gmail.com> wrote:
>>>>>>> Helo list,
>>>>>>> I can't find a solution for special chars
>>>>>>>
>>>>>>> I installed tesseract 3 in my MacOSX 10.6
>>>>>>> It is running very well
>>>>>>>
>>>>>>> But I'm having problems with charset.
>>>>>>> I need tesseract working with brazillian portuguese. (ISO8859-1)
>>>>>>>
>>>>>>> I installed the portuguese dictionary but is not working with special
>>>>>>> chars like  Ç Ã É é ....  (ISO8859-1)
>>>>>>> Is there any solution ?
>>>>>>>
>>>>>>> There is an old dictionary special for brazilian portuguese in version
>>>>>>> 2.0.4. Is it possible to use in version 3? How?
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>>>>> To unsubscribe from this group, send email to 
>>>>>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>>>>>> For more options, visit this group at 
>>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>>>> To unsubscribe from this group, send email to 
>>>>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>>>>> For more options, visit this group at 
>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>>
>>>>>> <por.traineddata>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "tesseract-ocr" group.
>>>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>>> To unsubscribe from this group, send email to 
>>>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>>>> For more options, visit this group at 
>>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "tesseract-ocr" group.
>>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>> To unsubscribe from this group, send email to 
>>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>>> For more options, visit this group at 
>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "tesseract-ocr" group.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> To unsubscribe from this group, send email to 
>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>> For more options, visit this group at 
>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>
>>>
>>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to