Which eng.traineddata did you use?

There are three options
>From tessdata, tessdata_best and tessdata_fast.

On Fri, 26 Apr 2019, 09:19 Giriraj Bhojak, <[email protected]> wrote:

> Hello Shree,
>
> I realize this post is more than two years old now, but would appreciate
> any help.
> I tried your suggestion on the same attached sample using tesseract v4 and
> I am unable to get the result as you have posted.
> I have tried all page segmentation modes, but none of them produced the
> result you have posted.
> Could you please let me know what I might be doing wrong?
>
> Here is the version detail for the tessreact on my machine:
>
> tesseract 4.0.0
>  leptonica-1.77.0
>   libgif 5.1.4 : libjpeg 9c : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11
> : libwebp 1.0.1 : libopenjp2 2.3.0
>  Found AVX2
>  Found AVX
>  Found SSE
>
> Here is the output I get for most of the psm modes:
>
>
> 8633 0410 NO RP 1107122016 NNNNNYNN 07 000001 0001 Page 20f3
>
> Did you know? Did you know?
>
> Your Comcast Business Internet Never miss a payment with text alerts.
> service gives you access to millions Receive text message reminders when
> your
> of WiFi hotspots with the fastest WiFi bill is ready to pay or past due.
> Sign up at
> and even more coverage. Find out business.comcast.com/myaccount.
>
> more at business.comcast.conm/wifi.
>
> Your bill is ready
>
>
>
> Need help? We’re here for you.
>
>
>
> > Visit business.comcast.com/help Please notify us immediately with any
> Call 1-800-391-3000 questions regarding charges billed to your
> aa account. Comcast will issue a credit or
> Billing support refund for any verified billing error which is
> Open 6 am-9 pm MTN, Mon through Fri brought to our attention within sixty
> (60) days
> and 7 am-8 pm Sat of the bill.
>
> Technical support
> Open 24 hours, 7 days a week
>
> TT
>
> Automatic payment If you’re moving, give us as much
> Sign up at business.comcast.com/myaccount advanced notice as possible so
> we
>
> Se Online can help make a smooth transition.
> Visit business.comcast.com/myaccount
>
> a By phone
> Call 1-800-391-3000
>
> Call 1-800-391-3000
>
> IME
>
>
>
>
>
> Regards,
> Giriraj.
>
> On Friday, April 21, 2017 at 4:55:03 AM UTC-4, shree wrote:
>>
>> If you want to OCR an invoice like the sample you posted, just use the
>> eng.traineddata and OCR the page. You do not need to do any training.
>>
>> Here is the output I get
>>
>>
>>
>> 8633 0410 NO RP 11 07122015 NNNNNYNN 01 000001 0001 Page 2 Of 3
>>
>>
>> Did you know?
>>
>>
>> Your Comcast Business Internet
>>
>> service gives you access to millions
>>
>> of WiFi hotspots with the fastest WiFi
>>
>> and even more coverage. Find out
>>
>> more at businesscomcast.com/wifi.
>>
>>
>>
>> Need help? We’re here for you.
>>
>>
>> 9 Visit business.comcast.com/help
>>
>> Call 1-800—391 -3000
>>
>> A
>>
>>
>> Billing support
>>
>> Open 6 am-9 pm MTN, Mon through Fri
>>
>> and 7 am—8 pm Sat
>>
>>
>> Technical support
>>
>> Open 24 hours, 7 days a week
>>
>>
>>
>> Did you know?
>>
>>
>> Never miss a payment with text alerts.
>>
>> Receive text message reminders when your
>>
>> bill is ready to pay or past due. Sign up at
>>
>> business.comcast.com/myaccount.
>>
>>
>>
>> Your bill is ready
>>
>>
>>
>>
>> Please notify us immediately with any
>>
>> questions regarding charges billed to your
>>
>> account. Comcast will issue a credit or
>>
>> refund for any verified billing error which is
>>
>> brought to our attention within sixty (60) days
>>
>> of the bill.
>>
>>
>> llllllllllllllllllllllllllllllllll
>>
>>
>> Additional payment options Moving? Let us help.
>>
>>
>> Automatic payment
>>
>> Sign up at business.comcast.com/myaccount
>>
>>
>> a Oniine
>>
>>
>> Visit business.comcast.com/myaccount
>>
>>
>> a By phone
>>
>> Call 1-800-391 -3000
>>
>>
>> if you're moving, give us as much
>>
>> advanced notice as possible so we
>>
>> can help make a smooth transition.
>>
>>
>> Call 1 -800-391 -3000
>>
>>
>> |||||||llllllllllllllllllllllllll
>>
>>
>>
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Fri, Apr 21, 2017 at 11:34 AM, Alain Ghawi <[email protected]> wrote:
>>
>>> Hello all,
>>>
>>> I am surprised by how many people tell me that tesseract is the best
>>> open-source OCR tool but yet there is no video explaining step-by-step the
>>> problems that you can encounter, or a good explanation and documentation
>>> for OCR.
>>>
>>> Well even though, everyone loves challenges! So here's the challenge I
>>> faced. I brought many pdf files that are invoices and I want to train
>>> tesseract to be able to ocr them as scanned images.
>>> So first of all, I transformed these pdf files into tif files
>>> using: magick -density 300 -depth 4   2151.pdf -background white -fill
>>> white -alpha Off  2151%d.tif
>>> This is ImageMagick. Nothing important here other than we have a 300 dpi
>>> image with an alpha channel off.
>>>
>>> You must rename them so : rename .tif files to:
>>> [lang].[name_font].exp0.tif (com.test_font.exp0.tif) This is for my example
>>>
>>> Great! After this step you must create your box file right? So I simply
>>> called:
>>> tesseract com.test_font.exp0.tif com.test_font.exp0 batch.nochop makebox
>>> tesseract com.test_font.exp0.tif com.test_font.exp1 batch.nochop makebox
>>>
>>> Then I fixed my files with CowBoxEditor as I wasn't finding the famous
>>> jTessBoxEditor online (weird right?) which did the job.
>>>
>>> After that, I created my .tr files:
>>> tesseract com.test_font.exp0.tif com.test_font.exp0 nobatch box.train
>>> tesseract com.test_font.exp1.tif com.test_font.exp1 nobatch box.train
>>>
>>> And here comes the surprises!!!
>>> After having your .tr files you call unicharset_extractor.
>>> First question: Why the glyph metrics are all 0,255,0,255,0,0,0,0,0,0?
>>> Which is wrong according to the documentation:
>>> https://github.com/tesseract-ocr/tesseract/blob/a3ba11b030345d32829b1e8355afea5419978d82/doc/unicharset.5.asc
>>> Second question: Should I write a box file, then the other or combine
>>> them? Option 1: unicharset_extractor com.test_font.exp0.box   or Option 2:
>>> unicharset_extractor com.test_font.exp0.box com.test_font.exp1.box
>>> Third question: set_unicharset_extractor why should I use it? It doesn't
>>> fix the metrics only specify if Latin or Common! Link:
>>> https://github.com/tesseract-ocr/tesseract/issues/318
>>>
>>> After all these unanswered questions, I used mftraining and cntraining
>>> (no problems). Finally, I renamed my inttemp, normproto,
>>> pffmtable, shapetable  and I combined them using combine_tessdata com.
>>>
>>> Final question: If I named com.inttemp1 com.inttemp2 does it work? Same
>>> for shapetable, normproto, pffmtable
>>>
>>> I think these questions are asked more than once by all new users to
>>> tesseract. Please if any expert in tesseract can answer these questions it
>>> will be a great help for all the community.
>>> Kindly find the attached 2 tif files and the boxes generated.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/beb558f3-d52c-4eca-a668-501a9804ffb0%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/beb558f3-d52c-4eca-a668-501a9804ffb0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/5b9b67fd-1474-48d8-95d9-15b17d295cc2%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/5b9b67fd-1474-48d8-95d9-15b17d295cc2%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUrx%3D%2B9ZeZ6H5X9LyNLBfaKQzS_OnyfRhP8tPvg1T5kLQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to