If your documents are only in Times Roman, you can train using just that
font and include the additional accents. Follow the training documentation.
Make sure that there are 10-20 occurances of each character in your
training images.

You can use a tool such as jtessboxeditor for generating the training
images and box files.

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Fri, Aug 23, 2013 at 10:54 PM, Davide Pioggia <dpiog...@gmail.com> wrote:

> Hi All,
> I'm doing OCR on documents written in normal Times Roman, so there isn't
> any problem in recognizing single characters. The problem is that the
> language is an Italian dialect, in which we use characters that aren't used
> in English nor in Italian, like ê, ë, ô, ö. Besides, the words aren't
> obviously those that can be found in an Italian dictionary.
> So the problem is just this: unknown words, more characters.
> Well, I've studied documentation for Tesseract training, and perhaps I
> haven't understood everything, but my impression is that most of all that I
> read is for languages that use special characters. Instead I have to read
> simple printed Times Roman.
> All I have to do is training Tesseract in recognizing some more characters
> (ê, ë, ô, ö) and adding new words to a new dictionary.
> So, it doesn't seem that I have to prepare training images, to draw boxes
> and so on.
> Is there an easier simple way to do what I need?
> Thanks!
> D.
>
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to