Hi Dmitry,

I just replaced with your file por.traineddata
But I'm getting an error:

manuel$ tesseract input.tiff output -l por
actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert failed:in 
file tessdatamanager.cpp, line 55
Segmentation fault

It's seem to be interesting to convert old files from 2.0X to 3, because there 
isn't a brazillian portuguese for version 3,  just "portuguese". 
At least the dictionary por.traineeddata is working correctly in version 3.
The special chars is being recognized by tesseract 3.

regards,
Manuel Pardo




Em 03/03/2011, às 09:12, Dmitry Silaev escreveu:

> Manuel,
> 
> It's quite an interesting question although it may seem to be an
> ordinary newbie-like one.
> 
> I was always wondering if 2.xx files can be used with version 3.xx.
> The wiki states that "the files in the traineddata file are different
> from the list used prior to 3.00, and will most likely change,
> possibly dramatically in future revisions."
> 
> I have no time to investigate it in the code so I decided to act
> rather than to think. After some tinkering with all those files I
> slipped the resulted "por.traineddata" into my Tesseract algo I'm
> currently working at, and - guess what? - it worked! ))
> 
> I must say it was tested only with a couple of *very simple* images
> and also it absolutely lacks any dictionary-related data. And my test
> images don't contain these specific Portuguese letters with
> diacritics. So in fact this file may perform poorly. Please test and
> report your results. The file is in the attachment.
> 
> It was not difficult at all but also not so straight-forward to make
> this training data file, so probably this process deserves a separate
> article and later I'd like to post it in my blog.
> 
> Warm regards,
> Dmitry Silaev
> 
> 
> 
> 
> 
> On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp <manuel...@gmail.com> wrote:
>> Helo list,
>> I can't find a solution for special chars
>> 
>> I installed tesseract 3 in my MacOSX 10.6
>> It is running very well
>> 
>> But I'm having problems with charset.
>> I need tesseract working with brazillian portuguese. (ISO8859-1)
>> 
>> I installed the portuguese dictionary but is not working with special
>> chars like  Ç Ã É é ....  (ISO8859-1)
>> Is there any solution ?
>> 
>> There is an old dictionary special for brazilian portuguese in version
>> 2.0.4. Is it possible to use in version 3? How?
>> 
>> 
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> To unsubscribe from this group, send email to 
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> For more options, visit this group at 
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>> 
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to 
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
> 
> <por.traineddata>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to