Manuel, Is the error message generated by version 2.xx? Did you try to run version 3.xx with my "por.traineddata" file? I don't get it - have you succeeded or not? Please provide us with the image you are trying to recognize.
Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com <manuel...@gmail.com> wrote: > Hi Dmitry, > > I just replaced with your file por.traineddata > But I'm getting an error: > > manuel$ tesseract input.tiff output -l por > actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert failed:in > file tessdatamanager.cpp, line 55 > Segmentation fault > > It's seem to be interesting to convert old files from 2.0X to 3, because > there isn't a brazillian portuguese for version 3, just "portuguese". > At least the dictionary por.traineeddata is working correctly in version 3. > The special chars is being recognized by tesseract 3. > > regards, > Manuel Pardo > > > > > Em 03/03/2011, às 09:12, Dmitry Silaev escreveu: > >> Manuel, >> >> It's quite an interesting question although it may seem to be an >> ordinary newbie-like one. >> >> I was always wondering if 2.xx files can be used with version 3.xx. >> The wiki states that "the files in the traineddata file are different >> from the list used prior to 3.00, and will most likely change, >> possibly dramatically in future revisions." >> >> I have no time to investigate it in the code so I decided to act >> rather than to think. After some tinkering with all those files I >> slipped the resulted "por.traineddata" into my Tesseract algo I'm >> currently working at, and - guess what? - it worked! )) >> >> I must say it was tested only with a couple of *very simple* images >> and also it absolutely lacks any dictionary-related data. And my test >> images don't contain these specific Portuguese letters with >> diacritics. So in fact this file may perform poorly. Please test and >> report your results. The file is in the attachment. >> >> It was not difficult at all but also not so straight-forward to make >> this training data file, so probably this process deserves a separate >> article and later I'd like to post it in my blog. >> >> Warm regards, >> Dmitry Silaev >> >> >> >> >> >> On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp <manuel...@gmail.com> wrote: >>> Helo list, >>> I can't find a solution for special chars >>> >>> I installed tesseract 3 in my MacOSX 10.6 >>> It is running very well >>> >>> But I'm having problems with charset. >>> I need tesseract working with brazillian portuguese. (ISO8859-1) >>> >>> I installed the portuguese dictionary but is not working with special >>> chars like Ç Ã É é .... (ISO8859-1) >>> Is there any solution ? >>> >>> There is an old dictionary special for brazilian portuguese in version >>> 2.0.4. Is it possible to use in version 3? How? >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>> To unsubscribe from this group, send email to >>> tesseract-ocr+unsubscr...@googlegroups.com. >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en. >>> >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> To unsubscribe from this group, send email to >> tesseract-ocr+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> >> <por.traineddata> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com. > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.