Awesome. Thanks. Found it in hin.config. -Raman
On Thursday, February 14, 2013 4:53:25 PM UTC+5:30, Nick White wrote: > > I didn't read the article Sven sent, but I'd guess the clipping mode > is set using a setting in the <lang>.config file. Unpack the > hin.traineddata file and take a look at hin.config for clues. > > On Thu, Feb 14, 2013 at 12:05:52AM -0800, rkvsraman wrote: > > I do understand how clipping is done. What i need to know is how to > direct > > Tesseract to do shirorekha clipping for a new language. > > > > For example , if i provide "-l hin" argument to tesseract , it does > clipping, > > while it does not when i provide "-l eng" for the same images which > contains > > hindi text. > > > > I renamed all hin.* data files to ben.* and now it does it even when i > provide > > "-l ben" argument. > > > > This means that the instruction to do shirorekha clipping is available > in the > > language tessdata. > > > > I just need to know where. > > > > Thanks. > > > > -Raman > > > > On Wednesday, February 13, 2013 9:31:24 PM UTC+5:30, sventech wrote: > > > > Are you aware of this paper published this paper published this > month? > > http://www.academia.edu/1944564/Shirorekha_Chopping_ > > > Integrated_Tesseract_OCR_Engine_for_Enhanced_Hindi_Language_Recognition > > > > I'll message you directly as well... > > --Sven > > > > > > On Mon, Feb 11, 2013 at 5:31 AM, rkvsraman <[email protected]> > wrote: > > > > What code changes do i make for Tesseract to understand that > Shirorekha > > splitting is required for Bengali or Punjabi? > > > > Thanks > > > > -Raman > > > > -- > > -- > > You received this message because you are subscribed to the > Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > > http://groups.google.com/group/tesseract-ocr?hl=en > > > > --- > > You received this message because you are subscribed to the > Google > > Groups "tesseract-ocr" group. > > To unsubscribe from this group and stop receiving emails from > it, send > > an email to [email protected]. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > > > > > > > > > > -- > > ``All that is gold does not glitter, > > not all those who wander are lost; > > the old that is strong does not wither, > > deep roots are not reached by the frost. > > From the ashes a fire shall be woken, > > a light from the shadows shall spring; > > renewed shall be blade that was broken, > > the crownless again shall be king.” > > > > -- > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to > > [email protected]<javascript:> > > To unsubscribe from this group, send email to > > [email protected] <javascript:> > > For more options, visit this group at > > http://groups.google.com/group/tesseract-ocr?hl=en > > > > --- > > You received this message because you are subscribed to the Google > Groups > > "tesseract-ocr" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email > > to [email protected] <javascript:>. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

