Re: Tesseract Training

Sriranga(78yrsold) Wed, 19 Jan 2011 02:55:56 -0800

Sochenda,
output of *lines viz.0ccb 8, 0cd5 8,  20c88 are appeared in vowel1.txt. So
we have to convert unicode numbers to Kannada Character(script) with help of
post-processor)*
-Regards,
-sriranga(78yrs)


On Wed, Jan 19, 2011 at 4:04 PM, Sriranga(78yrsold) <withblessi...@gmail.com
> wrote:

> Sochenda,
> pleas see inline reply below.
>
> On Wed, Jan 19, 2011 at 12:58 PM, KHEM Sochenda <khemsoche...@gmail.com>wrote:
>
>> Dear Dmitry and Sriranga,
>>
>> Thank you very much for you help. The reason why my output file is empty
>> because I put my person ID to the glyphs, isn't it?
>>
>> Dear Dmitry,
>> Please see the image attached, shall the image in the red box assigned to
>> a Unicode character or seperated as in the image? This glyph is composed of
>> two other glyphs-- one can be represented by a Unicode character, and the
>> other is a part of a vowel.
>>
>> Dear Sriranga,
>>
>> Are the several first lines in your unicharset files represent a
>> characters, or just any unicode character represent no any character. *These
>> lines viz.0ccb 8, 0cd5 8,  20c88 , 30ce0 are unicode number instead of
>> characters* *of Kannada* *to show you*. *Usually I am using
>> characters(Script) instead of unicode number for training purpose.  I am
>> using tesseract 3.01 alpha(r-529)
>> *
>> Khmer font is also attached. Thanks but unable to type. However it
>> appeared in CharacterMap.
>>
>   On receipt of your alphabets list I shall generated datafiles and
> forwarded to you.
>
>>
>> Best Regards,
>> Sochenda
>>
>>
>>
>> On Tue, Jan 18, 2011 at 8:27 PM, Dmitry Silaev <daemons2...@gmail.com>wrote:
>>
>>> Dear Sochenda,
>>>
>>> In addition to what Sriranga said I'd remind that you should do a lot of
>>> manual work:
>>>
>>> In pyTesseractTrainer check that no bounding boxes intersect glyphs; if
>>> some does - correct its BB coordinates manually.
>>>
>>> In cases of BB overlap you should space out participating glyphs in the
>>> training image (see the attached picture for examples).
>>>
>>> You should use manual spacing if participating glyphs are dependent
>>> characters (in your language - vowels) and the number of possible
>>> combinations is practically uncountable. Then you would assign every glyph
>>> its own code. Tess would consider these glyphs as separate characters and
>>> you should post-process the resulting code sequence to obtain a well-formed
>>> dependent Unicode pair (or triplet).
>>>
>>> If there can be only few such combinations - you can merge these BBs into
>>> one to encompass all the required glyphs and assign a single code to the
>>> entire glyph combination. Then during the post-processing you'll need to
>>> replace this single code with a predefined dependent Unicode pair.
>>>
>>> Hope I've managed to express myself clearly.
>>>
>>> Warm regards,
>>> Dmitry Silaev
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google Groups
>>> "tesseract-ocr" group.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
>>> .
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

0ccb 0cd5 0c88 0ce0 ೋ ಋೕೕ ಋ ೠ ಋ ಋ
ಾ ೀ ು ೂ ೄ ೇ ಋೂೕ ೕಋ ೠ ಾ   ು ೠೂ ೄ ಋೕಋ ೕೕ ೕಋ
ಾ ೀ ು ೂ ೄ ೀ ೕೂೕ ೕ ಋ ೠ ಾ ೕ ು ಋೂ ೄ ೕೕ ೠೂೕ ೕ ಋ
ೕೂೕ ೕ ಋ ೠ ೕೂೕ ೕ ಋ ೠ
ಾ ೀ ು ೂ ೄ ೇ ೋ ೕಋೠ ಾ ೀ ು ೂ ೄ ೇ ೋ ೕಋ
ಾ ೠೀ ು ೂ ೄ ೀ ೕೂೕ ೕ ಋ ೠ ಾ ೠೀ ು ೠೂ ೄ ೀ ೕೂೕ ೕ ಋ
ೋ ೕ ಋ ೠ ೋ ೕ ಋ ೠ
ಾ ೀ ು ೂ ೇ ಋೂೕ ೕ ಋ ೠ ೠೕ ಋೠ ೠ
ಾ ೀ ು ೂ ೄ ೀ ೠೇುೕ ೕ ಋ ೠ ಾ ಋ ು ೂ ೄ ೕಋ ೕೕ ೕ ಋ
ಋ ೄ ಾ ಋ ೕ ು ೂ ೄ ೠೕ ೂೕ ೕ ಋ
ೕೂೕ ೕ ಋ ೕಋೠ ೕೂೕ ೕ ಋ ೕಋೠ
ಾ ೕಋೠ ು ೂ ೄ ೇೕ ೇೕೕ ೕ ೕಾೕಋ ೠ ಾ ೀ ು ೂ ೄ ೇ ೕೋೕ ೕ ಋ ೋ
ಾ ೕ ು ೂ ೄ ೕ ೂೕ ೕ ಾ ೀ ು ೂ ೄ ೀ ೠೕೂೕ ೕಋ

kh.unicharset
Description: Binary data

Re: Tesseract Training

Reply via email to