Running via ports can cause diverse errors. Try to compile Tesseract natively. I use revision 549 and as I said it works fine.
Such tables as you have present a challenge for simple layout processing algorithms, due to sparsely located text. A minimal skew which is almost inevitable could break all the logic. In such cases I prefer to devise a custom made segmentation logic specific to the document type being processed. In this way I do not depend on Tesseract's segmentation - Tesseract is being used as a raw classifier. Warm regards, Dmitry Silaev On Sun, Mar 13, 2011 at 4:47 PM, manuel...@gmail.com <manuel...@gmail.com> wrote: > I'm using the latest version tesseract @3.00_2+eng > I installed using ports in MacOSX > > Another question Dmitry about this sample > In this sample why doesn't tesseract recognize a complete row? It's not a > perfect align, but it is impossible to get a image 100% aligned. > Tesseract is breaking columns in new lines like : > > 00001 test productA > 00002 test2 > productB > > Do you know how to fix it? > > Regard > Manuel Pardo > > > Em 13/03/2011, às 08:32, Dmitry Silaev escreveu: > >> Manuel, >> >> The sample you provided definitely has insufficient resolution. You >> may only expect some part of the heading to be recognized. So this is >> what happened when I've run the recognition of your image. But I >> haven't got any error or warning messages with my "por.traineddata" at >> all! >> >> However all this was tested under Windows. Probably I can try this >> under Ubuntu, but I don't know when I have enough time to reboot, set >> up a C++ compiler, build Tesseract and do some testing, sorry )) >> >> Are you sure you downloaded the latest stable version of Tesseract? >> >> Warm regards, >> Dmitry Silaev >> >> >> >> >> >> On Thu, Mar 10, 2011 at 9:32 PM, manuel...@gmail.com >> <manuel...@gmail.com> wrote: >>> I just replaced por.traineddata with your file por.traineddata. >>> After that I'm getting this message error: >>> >>>>> manuel$ tesseract input.tiff output -l por >>>>> actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert >>>>> failed:in file tessdatamanager.cpp, line 55 >>>>> Segmentation fault >>> >>> I haven't succeeded. I'm using version 3 - MacOSX 10.6 >>> >>> >>> >>> Attached Reported.tiff >>> >>> >>> >>> >>> >>> >>> Regards >>> Manuel Pardo >>> >>> Em 04/03/2011, às 03:19, Dmitry Silaev escreveu: >>> >>>> Manuel, >>>> >>>> Is the error message generated by version 2.xx? Did you try to run >>>> version 3.xx with my "por.traineddata" file? >>>> I don't get it - have you succeeded or not? >>>> Please provide us with the image you are trying to recognize. >>>> >>>> Warm regards, >>>> Dmitry Silaev >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com <manuel...@gmail.com> >>>> wrote: >>>>> Hi Dmitry, >>>>> >>>>> I just replaced with your file por.traineddata >>>>> But I'm getting an error: >>>>> >>>>> manuel$ tesseract input.tiff output -l por >>>>> actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert >>>>> failed:in file tessdatamanager.cpp, line 55 >>>>> Segmentation fault >>>>> >>>>> It's seem to be interesting to convert old files from 2.0X to 3, because >>>>> there isn't a brazillian portuguese for version 3, just "portuguese". >>>>> At least the dictionary por.traineeddata is working correctly in version >>>>> 3. >>>>> The special chars is being recognized by tesseract 3. >>>>> >>>>> regards, >>>>> Manuel Pardo >>>>> >>>>> >>>>> >>>>> >>>>> Em 03/03/2011, às 09:12, Dmitry Silaev escreveu: >>>>> >>>>>> Manuel, >>>>>> >>>>>> It's quite an interesting question although it may seem to be an >>>>>> ordinary newbie-like one. >>>>>> >>>>>> I was always wondering if 2.xx files can be used with version 3.xx. >>>>>> The wiki states that "the files in the traineddata file are different >>>>>> from the list used prior to 3.00, and will most likely change, >>>>>> possibly dramatically in future revisions." >>>>>> >>>>>> I have no time to investigate it in the code so I decided to act >>>>>> rather than to think. After some tinkering with all those files I >>>>>> slipped the resulted "por.traineddata" into my Tesseract algo I'm >>>>>> currently working at, and - guess what? - it worked! )) >>>>>> >>>>>> I must say it was tested only with a couple of *very simple* images >>>>>> and also it absolutely lacks any dictionary-related data. And my test >>>>>> images don't contain these specific Portuguese letters with >>>>>> diacritics. So in fact this file may perform poorly. Please test and >>>>>> report your results. The file is in the attachment. >>>>>> >>>>>> It was not difficult at all but also not so straight-forward to make >>>>>> this training data file, so probably this process deserves a separate >>>>>> article and later I'd like to post it in my blog. >>>>>> >>>>>> Warm regards, >>>>>> Dmitry Silaev >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp <manuel...@gmail.com> wrote: >>>>>>> Helo list, >>>>>>> I can't find a solution for special chars >>>>>>> >>>>>>> I installed tesseract 3 in my MacOSX 10.6 >>>>>>> It is running very well >>>>>>> >>>>>>> But I'm having problems with charset. >>>>>>> I need tesseract working with brazillian portuguese. (ISO8859-1) >>>>>>> >>>>>>> I installed the portuguese dictionary but is not working with special >>>>>>> chars like Ç Ã É é .... (ISO8859-1) >>>>>>> Is there any solution ? >>>>>>> >>>>>>> There is an old dictionary special for brazilian portuguese in version >>>>>>> 2.0.4. Is it possible to use in version 3? How? >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>>>>>> To unsubscribe from this group, send email to >>>>>>> tesseract-ocr+unsubscr...@googlegroups.com. >>>>>>> For more options, visit this group at >>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en. >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>>>>> To unsubscribe from this group, send email to >>>>>> tesseract-ocr+unsubscr...@googlegroups.com. >>>>>> For more options, visit this group at >>>>>> http://groups.google.com/group/tesseract-ocr?hl=en. >>>>>> >>>>>> <por.traineddata> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "tesseract-ocr" group. >>>>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>>>> To unsubscribe from this group, send email to >>>>> tesseract-ocr+unsubscr...@googlegroups.com. >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/tesseract-ocr?hl=en. >>>>> >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "tesseract-ocr" group. >>>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>>> To unsubscribe from this group, send email to >>>> tesseract-ocr+unsubscr...@googlegroups.com. >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en. >>>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>> To unsubscribe from this group, send email to >>> tesseract-ocr+unsubscr...@googlegroups.com. >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en. >>> >>> >>> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.