Sochenda, output of *lines viz.0ccb 8, 0cd5 8, 20c88 are appeared in vowel1.txt. So we have to convert unicode numbers to Kannada Character(script) with help of post-processor)* -Regards, -sriranga(78yrs)
On Wed, Jan 19, 2011 at 4:04 PM, Sriranga(78yrsold) <withblessi...@gmail.com > wrote: > Sochenda, > pleas see inline reply below. > > On Wed, Jan 19, 2011 at 12:58 PM, KHEM Sochenda <khemsoche...@gmail.com>wrote: > >> Dear Dmitry and Sriranga, >> >> Thank you very much for you help. The reason why my output file is empty >> because I put my person ID to the glyphs, isn't it? >> >> Dear Dmitry, >> Please see the image attached, shall the image in the red box assigned to >> a Unicode character or seperated as in the image? This glyph is composed of >> two other glyphs-- one can be represented by a Unicode character, and the >> other is a part of a vowel. >> >> Dear Sriranga, >> >> Are the several first lines in your unicharset files represent a >> characters, or just any unicode character represent no any character. *These >> lines viz.0ccb 8, 0cd5 8, 20c88 , 30ce0 are unicode number instead of >> characters* *of Kannada* *to show you*. *Usually I am using >> characters(Script) instead of unicode number for training purpose. I am >> using tesseract 3.01 alpha(r-529) >> * >> Khmer font is also attached. Thanks but unable to type. However it >> appeared in CharacterMap. >> > On receipt of your alphabets list I shall generated datafiles and > forwarded to you. > >> >> Best Regards, >> Sochenda >> >> >> >> On Tue, Jan 18, 2011 at 8:27 PM, Dmitry Silaev <daemons2...@gmail.com>wrote: >> >>> Dear Sochenda, >>> >>> In addition to what Sriranga said I'd remind that you should do a lot of >>> manual work: >>> >>> In pyTesseractTrainer check that no bounding boxes intersect glyphs; if >>> some does - correct its BB coordinates manually. >>> >>> In cases of BB overlap you should space out participating glyphs in the >>> training image (see the attached picture for examples). >>> >>> You should use manual spacing if participating glyphs are dependent >>> characters (in your language - vowels) and the number of possible >>> combinations is practically uncountable. Then you would assign every glyph >>> its own code. Tess would consider these glyphs as separate characters and >>> you should post-process the resulting code sequence to obtain a well-formed >>> dependent Unicode pair (or triplet). >>> >>> If there can be only few such combinations - you can merge these BBs into >>> one to encompass all the required glyphs and assign a single code to the >>> entire glyph combination. Then during the post-processing you'll need to >>> replace this single code with a predefined dependent Unicode pair. >>> >>> Hope I've managed to express myself clearly. >>> >>> Warm regards, >>> Dmitry Silaev >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>> To unsubscribe from this group, send email to >>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com> >>> . >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> To unsubscribe from this group, send email to >> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com> >> . >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
0ccb 0cd5 0c88 0ce0 ೋ ಋೕೕ ಋ ೠ ಋ ಋ ಾ ೀ ು ೂ ೄ ೇ ಋೂೕ ೕಋ ೠ ಾ ು ೠೂ ೄ ಋೕಋ ೕೕ ೕಋ ಾ ೀ ು ೂ ೄ ೀ ೕೂೕ ೕ ಋ ೠ ಾ ೕ ು ಋೂ ೄ ೕೕ ೠೂೕ ೕ ಋ ೕೂೕ ೕ ಋ ೠ ೕೂೕ ೕ ಋ ೠ ಾ ೀ ು ೂ ೄ ೇ ೋ ೕಋೠ ಾ ೀ ು ೂ ೄ ೇ ೋ ೕಋ ಾ ೠೀ ು ೂ ೄ ೀ ೕೂೕ ೕ ಋ ೠ ಾ ೠೀ ು ೠೂ ೄ ೀ ೕೂೕ ೕ ಋ ೋ ೕ ಋ ೠ ೋ ೕ ಋ ೠ ಾ ೀ ು ೂ ೇ ಋೂೕ ೕ ಋ ೠ ೠೕ ಋೠ ೠ ಾ ೀ ು ೂ ೄ ೀ ೠೇುೕ ೕ ಋ ೠ ಾ ಋ ು ೂ ೄ ೕಋ ೕೕ ೕ ಋ ಋ ೄ ಾ ಋ ೕ ು ೂ ೄ ೠೕ ೂೕ ೕ ಋ ೕೂೕ ೕ ಋ ೕಋೠ ೕೂೕ ೕ ಋ ೕಋೠ ಾ ೕಋೠ ು ೂ ೄ ೇೕ ೇೕೕ ೕ ೕಾೕಋ ೠ ಾ ೀ ು ೂ ೄ ೇ ೕೋೕ ೕ ಋ ೋ ಾ ೕ ು ೂ ೄ ೕ ೂೕ ೕ ಾ ೀ ು ೂ ೄ ೀ ೠೕೂೕ ೕಋ
kh.unicharset
Description: Binary data