please ensure typed alphabets as a text and not image file. 2011/1/19 Sriranga(78yrsold) <withblessi...@gmail.com>
> Sochenda, > Attached khamer alphabets txt prepared based on charactermap as well as > unicode chart - since I am unable to type in your lang eventhough i have > installed font supplied by you.. > please prepare text (saved as utf8) as per sample txt file attached. I > shall try to generated trained data. > > > On Wed, Jan 19, 2011 at 12:58 PM, KHEM Sochenda <khemsoche...@gmail.com>wrote: > >> Dear Dmitry and Sriranga, >> >> Thank you very much for you help. The reason why my output file is empty >> because I put my person ID to the glyphs, isn't it? >> >> Dear Dmitry, >> Please see the image attached, shall the image in the red box assigned to >> a Unicode character or seperated as in the image? This glyph is composed of >> two other glyphs-- one can be represented by a Unicode character, and the >> other is a part of a vowel. >> >> Dear Sriranga, >> >> Are the several first lines in your unicharset files represent a >> characters, or just any unicode character represent no any character. >> >> Khmer font is also attached. >> >> Best Regards, >> Sochenda >> >> >> >> On Tue, Jan 18, 2011 at 8:27 PM, Dmitry Silaev <daemons2...@gmail.com>wrote: >> >>> Dear Sochenda, >>> >>> In addition to what Sriranga said I'd remind that you should do a lot of >>> manual work: >>> >>> In pyTesseractTrainer check that no bounding boxes intersect glyphs; if >>> some does - correct its BB coordinates manually. >>> >>> In cases of BB overlap you should space out participating glyphs in the >>> training image (see the attached picture for examples). >>> >>> You should use manual spacing if participating glyphs are dependent >>> characters (in your language - vowels) and the number of possible >>> combinations is practically uncountable. Then you would assign every glyph >>> its own code. Tess would consider these glyphs as separate characters and >>> you should post-process the resulting code sequence to obtain a well-formed >>> dependent Unicode pair (or triplet). >>> >>> If there can be only few such combinations - you can merge these BBs into >>> one to encompass all the required glyphs and assign a single code to the >>> entire glyph combination. Then during the post-processing you'll need to >>> replace this single code with a predefined dependent Unicode pair. >>> >>> Hope I've managed to express myself clearly. >>> >>> Warm regards, >>> Dmitry Silaev >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>> To unsubscribe from this group, send email to >>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com> >>> . >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> To unsubscribe from this group, send email to >> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com> >> . >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.