On Wed, Nov 27, 2013 at 09:51:22PM +0530, V S Rawat wrote:
> wow. That was new information that we should use unicharambigs
> instead of DangAmbigs.

Not that new really, it's been that way for years now :)

> 1. where should this file be put.

It should be called xxx.unicharambigs (where xxx is the language
code you're using), and then added to the training file using
combine_tessdata.

> 2. Is the same file to be used for all lanuages? previous method was
> convenient when each language has its own file name.

The file is specific to each training.

> File should have a recognized extension to help it getting opened
> automatically in standard relevant editor. it is bad method to have
> a file without an extension.

It's only a bad habit in the Windows world, so it isn't really a bad
habit at all ;)

Note though that this is just a quick hack really, which probably
won't work well for different fonts or pages. The correct way of
doing this would be to retrain with the extra characters with all
the diacritics you need, but obviously that would take more work.
That's why Shree Devi Kumar just recommended it as a quick sed
script; it may well work well for a few pages that you need to sort
out, but shouldn't be relied upon for anything more than that.

Nick

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to