I am using windows XP; occasionally CentOS

On Mon, Jan 17, 2011 at 2:16 PM, Sriranga(78yrsold) <withblessi...@gmail.com
> wrote:

> From Pdf  it is observed thare are number of dependent vowels existed. The
> case is similar to Indic lang.
> Let me know which OS you are using?
>
>
> On Mon, Jan 17, 2011 at 12:42 PM, KHEM Sochenda <khemsoche...@gmail.com>wrote:
>
>> this link will lead you to Khmer Unicode page
>> http://unicode.org/charts/PDF/U1780.pdf
>>
>>
>> On Mon, Jan 17, 2011 at 2:06 PM, Sriranga(78yrsold) <
>> withblessi...@gmail.com> wrote:
>>
>>> Viewed Khemer unicode chart (pdf) there are dependent vowels are there.
>>> It is better to use bbtool to generate box file. please see wiki section for
>>> tools.
>>>
>>>
>>> On Mon, Jan 17, 2011 at 12:24 PM, Sriranga(78yrsold) <
>>> withblessi...@gmail.com> wrote:
>>>
>>>> Is there are dependent vowel in your Khemer lang. If you have unicode
>>>> chart  better to upload
>>>>
>>>>
>>>> On Mon, Jan 17, 2011 at 12:13 PM, KHEM Sochenda <khemsoche...@gmail.com
>>>> > wrote:
>>>>
>>>>> I know how to do it in tesseract, but the image just to show you how
>>>>> the glyphs should be boxed.
>>>>>
>>>>> I can send you the box file generate by tesseract anyway.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Sochenda
>>>>>
>>>>>
>>>>> On Mon, Jan 17, 2011 at 1:41 PM, Sriranga(78yrsold) <
>>>>> withblessi...@gmail.com> wrote:
>>>>>
>>>>>> as per wiki instructions.- commandline has to be used to generate box
>>>>>> file as follow - as per wiki instructions.
>>>>>> tesseract <lang.fontname.number.tif >   <lang.fontname.number>
>>>>>> batch.nochop makebox
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 17, 2011 at 11:55 AM, KHEM Sochenda <
>>>>>> khemsoche...@gmail.com> wrote:
>>>>>>
>>>>>>> In the image, I've done manually.
>>>>>>>
>>>>>>> On Mon, Jan 17, 2011 at 12:16 PM, Sriranga(78yrsold) <
>>>>>>> withblessi...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Which tool you have used to create boxes. Please also upload box
>>>>>>>> file generated by you.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 17, 2011 at 9:31 AM, KHEM Sochenda <
>>>>>>>> khemsoche...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Dear Dmitry,
>>>>>>>>>
>>>>>>>>> Thank you again for a very quick response.
>>>>>>>>>
>>>>>>>>> I am going to train tesseract for Khmer Language in which there are
>>>>>>>>> many ligatures are in the same cases as "fi" in some latin fonts.
>>>>>>>>> The attachment show you the example of the one line khmer sentence,
>>>>>>>>> please count the box from left to right. You can see that some glyphs 
>>>>>>>>> are
>>>>>>>>> above to others. The first glyph is formed of two unicode characters,
>>>>>>>>> somehow the third glyph and the fifth glyph form a Unicode 
>>>>>>>>> characters. This
>>>>>>>>> is the reason why I wish to give each glype its own ID and then I do 
>>>>>>>>> a post
>>>>>>>>> processing afterward.
>>>>>>>>>
>>>>>>>>> Regarding the two glyphs which are overlapped each other like the
>>>>>>>>> case of 7th glyph and the 8th glyph, how tesseract will segment these
>>>>>>>>> glyphs?  How to give the position of the boxes?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you very much in advance for your response.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>>
>>>>>>>>> Sochenda
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Jan 16, 2011 at 3:48 PM, Dmitry Silaev <
>>>>>>>>> daemons2...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Dear Sochenda,
>>>>>>>>>>
>>>>>>>>>> I'm not sure what's the ultimate goal of your code assignment but
>>>>>>>>>> a formal answer to your question is "Yes". You can assign "k001" or 
>>>>>>>>>> "k002"
>>>>>>>>>> to a bounding box in a .box file. Moreover, you can assign any UTF-8 
>>>>>>>>>> encoded
>>>>>>>>>> character sequence. In Tess version 3.0x (current) the only 
>>>>>>>>>> restriction is a
>>>>>>>>>> 24 byte limit for the entire char sequence length. This also allows 
>>>>>>>>>> you to
>>>>>>>>>> use not only an abstract code like "k001" but a meaningful character
>>>>>>>>>> sequence from your real language (e.g. a well-known "fi" ligature in 
>>>>>>>>>> some
>>>>>>>>>> Latin fonts) which then relieves you from using the pre- and
>>>>>>>>>> post-processing.
>>>>>>>>>>
>>>>>>>>>> If you still prefer using abstract codes then pre-/post-processing
>>>>>>>>>> can be done without tinkering with Tess's code. Since training as 
>>>>>>>>>> well as
>>>>>>>>>> recognition result in generation of output files, you can develop a 
>>>>>>>>>> couple
>>>>>>>>>> of file processing command-line utilities which then can be used 
>>>>>>>>>> along with
>>>>>>>>>> calls to the Tesseract executable within shell scripts (or .bat 
>>>>>>>>>> files in
>>>>>>>>>> Windows).
>>>>>>>>>>
>>>>>>>>>> For further details you definitely should study thoroughly the
>>>>>>>>>> "TrainingTesseract3" and "ReadMe" (section "Installation Notes - 
>>>>>>>>>> Tesseract
>>>>>>>>>> 3.00") documents (
>>>>>>>>>> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3and
>>>>>>>>>> http://code.google.com/p/tesseract-ocr/wiki/ReadMe). These are
>>>>>>>>>> not quite easy searchable documents but they contain all the info 
>>>>>>>>>> you might
>>>>>>>>>> need.
>>>>>>>>>>
>>>>>>>>>> Warm regards,
>>>>>>>>>> Dmitry Silaev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Jan 16, 2011 at 10:42 AM, KHEM Sochenda <
>>>>>>>>>> khemsoche...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Dear Dmitry,
>>>>>>>>>>>
>>>>>>>>>>> Thank you very much for a comprehensive explanation.
>>>>>>>>>>> Let say, to go straight, does it sound ok by assigning a code
>>>>>>>>>>> like 'k001' or 'k002' to the glype obtain from tesseract 
>>>>>>>>>>> segmentation?
>>>>>>>>>>>
>>>>>>>>>>> For post processing, touching the code tesseract, could you
>>>>>>>>>>> please point me out which I files I should modify to work on. 
>>>>>>>>>>> Advice me if
>>>>>>>>>>> the last version of tesseract will do fine.
>>>>>>>>>>>
>>>>>>>>>>> Thank you very much in advance for your time and response back.
>>>>>>>>>>>
>>>>>>>>>>> Best Regards,
>>>>>>>>>>>
>>>>>>>>>>> Sochenda
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Jan 15, 2011 at 3:05 AM, Dmitry Silaev <
>>>>>>>>>>> daemons2...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Chenda,
>>>>>>>>>>>>
>>>>>>>>>>>> In fact Tesseract doesn't care if you do training for a real
>>>>>>>>>>>> language's letter and which language this letter belongs to. 
>>>>>>>>>>>> Simplistically
>>>>>>>>>>>> saying Tess only saves the mapping of feature sets obtained from 
>>>>>>>>>>>> training to
>>>>>>>>>>>> Unicode ids. This implies that during training you can assign 
>>>>>>>>>>>> virtually any
>>>>>>>>>>>> character code to virtually any glyph (to be exact, to a connected 
>>>>>>>>>>>> component
>>>>>>>>>>>> or to a set of connected components).
>>>>>>>>>>>>
>>>>>>>>>>>> If your language script is comprised by a reasonable number of
>>>>>>>>>>>> joint character combinations then while training you can assign 
>>>>>>>>>>>> every such
>>>>>>>>>>>> combination a predefined Unicode id (some restrictions apply). 
>>>>>>>>>>>> Later, when
>>>>>>>>>>>> running recognition, you should do some post-processing to decode 
>>>>>>>>>>>> your
>>>>>>>>>>>> predefined ids into real language's character sequences.
>>>>>>>>>>>>
>>>>>>>>>>>> For good results all this requires you to develop a training
>>>>>>>>>>>> file pre-processor (mapping: language char combinations -> 
>>>>>>>>>>>> provisional ids)
>>>>>>>>>>>> and a recognition result post-processor (mapping: provisional ids 
>>>>>>>>>>>> ->
>>>>>>>>>>>> language char sequences). I'm not sure but this also may require 
>>>>>>>>>>>> correcting
>>>>>>>>>>>> character property bit masks in the unicharset file (I don't know 
>>>>>>>>>>>> exactly
>>>>>>>>>>>> how this information is used by Tess as I don't need it in my 
>>>>>>>>>>>> project).
>>>>>>>>>>>>
>>>>>>>>>>>> Warm regards,
>>>>>>>>>>>> Dmitry Silaev
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 14, 2011 at 10:25 AM, KHEM Sochenda <
>>>>>>>>>>>> khemsoche...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Dear Tesseract Team,
>>>>>>>>>>>>>
>>>>>>>>>>>>> In training new language step, we have to assign a unicode
>>>>>>>>>>>>> value to each box.
>>>>>>>>>>>>> I would like to know if a shape that is composed of *several
>>>>>>>>>>>>> unicode characters?
>>>>>>>>>>>>> Is there anyway to assign only an id for each box in tesseract?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you very much in advance for your response.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Chenda *
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. **
>>>>>>>>>>>>>
>>>>>>>>>>>>>  --
>>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>> To post to this group, send email to
>>>>>>>>>>>>> tesseract-ocr@googlegroups.com.
>>>>>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>>>>>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
>>>>>>>>>>>>> .
>>>>>>>>>>>>> For more options, visit this group at
>>>>>>>>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  --
>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>> To post to this group, send email to
>>>>>>>>>>>> tesseract-ocr@googlegroups.com.
>>>>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>>>>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
>>>>>>>>>>>> .
>>>>>>>>>>>> For more options, visit this group at
>>>>>>>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  --
>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>> To post to this group, send email to
>>>>>>>>>>> tesseract-ocr@googlegroups.com.
>>>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>>>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
>>>>>>>>>>> .
>>>>>>>>>>> For more options, visit this group at
>>>>>>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  --
>>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>>> To post to this group, send email to
>>>>>>>>>> tesseract-ocr@googlegroups.com.
>>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
>>>>>>>>>> .
>>>>>>>>>> For more options, visit this group at
>>>>>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To post to this group, send email to
>>>>>>>>> tesseract-ocr@googlegroups.com.
>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
>>>>>>>>> .
>>>>>>>>> For more options, visit this group at
>>>>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To post to this group, send email to tesseract-ocr@googlegroups.com
>>>>>>>> .
>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
>>>>>>>> .
>>>>>>>> For more options, visit this group at
>>>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
>>>>>>> .
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>>>
>>>>>>
>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>>>> To unsubscribe from this group, send email to
>>>>>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
>>>>>> .
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>>
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>>> To unsubscribe from this group, send email to
>>>>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
>>>>> .
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>
>>>>
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google Groups
>>> "tesseract-ocr" group.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
>>> .
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com<tesseract-ocr%2bunsubscr...@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to