I do understand how clipping is done. What i need to know is how to direct Tesseract to do shirorekha clipping for a new language.
For example , if i provide "-l hin" argument to tesseract , it does clipping, while it does not when i provide "-l eng" for the same images which contains hindi text. I renamed all hin.* data files to ben.* and now it does it even when i provide "-l ben" argument. This means that the instruction to do shirorekha clipping is available in the language tessdata. I just need to know where. Thanks. -Raman On Wednesday, February 13, 2013 9:31:24 PM UTC+5:30, sventech wrote: > > Are you aware of this paper published this paper published this month? > > http://www.academia.edu/1944564/Shirorekha_Chopping_Integrated_Tesseract_OCR_Engine_for_Enhanced_Hindi_Language_Recognition > > I'll message you directly as well... > --Sven > > > On Mon, Feb 11, 2013 at 5:31 AM, rkvsraman <[email protected]<javascript:> > > wrote: > >> What code changes do i make for Tesseract to understand that Shirorekha >> splitting is required for Bengali or Punjabi? >> >> Thanks >> >> -Raman >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected]<javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> --- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

