Re: How exactly tesseract works

Sven Pedersen Tue, 08 Nov 2011 18:00:28 -0800

You should start with the commandline version to see how the recognition
works, then you can isolate any problems with your own code. Image
enhancement usually means adjusting contrast and the clarity
or smoothness of the text. You should show us an example image so we can
see what issues you will have to deal with.
--Sven



On Mon, Nov 7, 2011 at 9:16 PM, Navin Math <hinavinm...@gmail.com> wrote:

> If tesseract does not able recognize some text on a image, what we have to
> do with such image, Do we need to enhance the image, what does exactly
> enhancing means? If have some specific font on my image, for that do i need
> to train the tesseract.
>
> Currently I am doing this,
> Created the object TestBaseAPI, passing the image file path or passing the
> bitmap
> later i call the getUTF8Text() function to get the extracted text from the
> image.
>
> This is it right? or i need to do anything else?
>
> Thanks
>
>
>
>
> On Sat, Oct 29, 2011 at 9:48 PM, Quan Nguyen <nguyen...@gmail.com> wrote:
>
>> Tesseract binary executable and language data files are all you need.
>>
>> On Oct 28, 3:04 pm, Navin Math <hinavinm...@gmail.com> wrote:
>> > thanks.
>> > Currently I am using images containing English text, I have placed only
>> > eng.traineddata file at the specified location, do i need to place any
>> other
>> > files at same location for the tesseract tool.
>> >
>> > On Fri, Oct 28, 2011 at 8:06 PM, Sven Pedersen <sven.peder...@gmail.com
>> >wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > > Hi Navin,
>> > > Usually documents scanned at 90dpi will do poorly, but what really
>> matters
>> > > is the font size. Typical 10-14 point font documents should be
>> scanned at
>> > > 200 - 300 dpi for best results. For training questions, you'll need
>> to tell
>> > > us more about whether the language and domain within the language are
>> > > already what's available in Tesseract. Read the FAQ for details about
>> > > training. Show us example images of what you're having trouble with
>> if that
>> > > doesn't solve your problem.
>> > > --Sven
>> >
>> > > On Fri, Oct 28, 2011 at 7:42 AM, navin <hinavinm...@gmail.com> wrote:
>> >
>> > >> Hi
>> > >> I have two images of same DPI ex: 90 dpi. I used tesseract tool to
>> > >> extract the strings from both images:
>> > >> First image ---> almost 90% of the strings are properly recognized
>> > >> from the image,
>> > >> Second image---> 0%, no strings are recognized properly.
>> >
>> > >> I wanted to study why it is failing here?
>> > >> To improve the accuracy what I have to do?
>> > >> DO i need to increase the resolution DPI of the image? and how much?
>> > >> Any other steps i need to work on? I mean need to change any training
>> > >> data files (currently I am using the files what i have downloaded
>> from
>> > >> the tesseract download page).
>> > >> Any links which explains how to increase the accuracy?
>> >
>> > >> Thanks
>> >
>> > >> --
>> > >> You received this message because you are subscribed to the Google
>> > >> Groups "tesseract-ocr" group.
>> > >> To post to this group, send email to tesseract-ocr@googlegroups.com
>> > >> To unsubscribe from this group, send email to
>> > >> tesseract-ocr+unsubscr...@googlegroups.com
>> > >> For more options, visit this group at
>> > >>http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>> > > --
>> > > ``All that is gold does not glitter,
>> > >   not all those who wander are lost;
>> > > the old that is strong does not wither,
>> > >   deep roots are not reached by the frost.
>> > > From the ashes a fire shall be woken,
>> > >   a light from the shadows shall spring;
>> > > renewed shall be blade that was broken,
>> > >   the crownless again shall be king.”
>> >
>> > > --
>> > > You received this message because you are subscribed to the Google
>> > > Groups "tesseract-ocr" group.
>> > > To post to this group, send email to tesseract-ocr@googlegroups.com
>> > > To unsubscribe from this group, send email to
>> > > tesseract-ocr+unsubscr...@googlegroups.com
>> > > For more options, visit this group at
>> > >http://groups.google.com/group/tesseract-ocr?hl=en
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to tesseract-ocr@googlegroups.com
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: How exactly tesseract works

Reply via email to