Awesome. Thanks. Found it in hin.config. 

-Raman

On Thursday, February 14, 2013 4:53:25 PM UTC+5:30, Nick White wrote:
>
> I didn't read the article Sven sent, but I'd guess the clipping mode 
> is set using a setting in the <lang>.config file. Unpack the 
> hin.traineddata file and take a look at hin.config for clues. 
>
> On Thu, Feb 14, 2013 at 12:05:52AM -0800, rkvsraman wrote: 
> > I do understand how clipping is done. What i need to know is how to 
> direct 
> > Tesseract to do shirorekha clipping for a new language. 
> > 
> > For example , if i provide "-l hin" argument to tesseract , it does 
> clipping, 
> > while it does not when i provide "-l eng"  for the same images which 
> contains 
> > hindi text. 
> > 
> > I renamed all hin.* data files to ben.* and now it does it even when i 
> provide 
> > "-l ben" argument. 
> > 
> > This means that the instruction to do shirorekha clipping is available 
> in the 
> > language tessdata. 
> > 
> > I just need to know where. 
> > 
> > Thanks. 
> > 
> > -Raman 
> > 
> > On Wednesday, February 13, 2013 9:31:24 PM UTC+5:30, sventech wrote: 
> > 
> >     Are you aware of this paper published this paper published this 
> month? 
> >     http://www.academia.edu/1944564/Shirorekha_Chopping_ 
> >     
> Integrated_Tesseract_OCR_Engine_for_Enhanced_Hindi_Language_Recognition 
> > 
> >     I'll message you directly as well... 
> >     --Sven 
> > 
> > 
> >     On Mon, Feb 11, 2013 at 5:31 AM, rkvsraman <[email protected]> 
> wrote: 
> > 
> >         What code changes do i make for Tesseract to understand that 
> Shirorekha 
> >         splitting is required for Bengali or Punjabi? 
> > 
> >         Thanks 
> > 
> >         -Raman 
> > 
> >         -- 
> >         -- 
> >         You received this message because you are subscribed to the 
> Google 
> >         Groups "tesseract-ocr" group. 
> >         To post to this group, send email to [email protected] 
> >         To unsubscribe from this group, send email to 
> >         [email protected] 
> >         For more options, visit this group at 
> >         http://groups.google.com/group/tesseract-ocr?hl=en 
> >           
> >         --- 
> >         You received this message because you are subscribed to the 
> Google 
> >         Groups "tesseract-ocr" group. 
> >         To unsubscribe from this group and stop receiving emails from 
> it, send 
> >         an email to [email protected]. 
> >         For more options, visit https://groups.google.com/groups/opt_out. 
>
> >           
> >           
> > 
> > 
> > 
> > 
> >     -- 
> >     ``All that is gold does not glitter, 
> >       not all those who wander are lost; 
> >     the old that is strong does not wither, 
> >       deep roots are not reached by the frost. 
> >     From the ashes a fire shall be woken, 
> >       a light from the shadows shall spring; 
> >     renewed shall be blade that was broken, 
> >       the crownless again shall be king.” 
> > 
> > -- 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "tesseract-ocr" group. 
> > To post to this group, send email to 
> > [email protected]<javascript:> 
> > To unsubscribe from this group, send email to 
> > [email protected] <javascript:> 
> > For more options, visit this group at 
> > http://groups.google.com/group/tesseract-ocr?hl=en 
> >   
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "tesseract-ocr" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email 
> > to [email protected] <javascript:>. 
> > For more options, visit https://groups.google.com/groups/opt_out. 
> >   
> >   
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to