Which eng.traineddata did you use? There are three options >From tessdata, tessdata_best and tessdata_fast.
On Fri, 26 Apr 2019, 09:19 Giriraj Bhojak, <[email protected]> wrote: > Hello Shree, > > I realize this post is more than two years old now, but would appreciate > any help. > I tried your suggestion on the same attached sample using tesseract v4 and > I am unable to get the result as you have posted. > I have tried all page segmentation modes, but none of them produced the > result you have posted. > Could you please let me know what I might be doing wrong? > > Here is the version detail for the tessreact on my machine: > > tesseract 4.0.0 > leptonica-1.77.0 > libgif 5.1.4 : libjpeg 9c : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 > : libwebp 1.0.1 : libopenjp2 2.3.0 > Found AVX2 > Found AVX > Found SSE > > Here is the output I get for most of the psm modes: > > > 8633 0410 NO RP 1107122016 NNNNNYNN 07 000001 0001 Page 20f3 > > Did you know? Did you know? > > Your Comcast Business Internet Never miss a payment with text alerts. > service gives you access to millions Receive text message reminders when > your > of WiFi hotspots with the fastest WiFi bill is ready to pay or past due. > Sign up at > and even more coverage. Find out business.comcast.com/myaccount. > > more at business.comcast.conm/wifi. > > Your bill is ready > > > > Need help? We’re here for you. > > > > > Visit business.comcast.com/help Please notify us immediately with any > Call 1-800-391-3000 questions regarding charges billed to your > aa account. Comcast will issue a credit or > Billing support refund for any verified billing error which is > Open 6 am-9 pm MTN, Mon through Fri brought to our attention within sixty > (60) days > and 7 am-8 pm Sat of the bill. > > Technical support > Open 24 hours, 7 days a week > > TT > > Automatic payment If you’re moving, give us as much > Sign up at business.comcast.com/myaccount advanced notice as possible so > we > > Se Online can help make a smooth transition. > Visit business.comcast.com/myaccount > > a By phone > Call 1-800-391-3000 > > Call 1-800-391-3000 > > IME > > > > > > Regards, > Giriraj. > > On Friday, April 21, 2017 at 4:55:03 AM UTC-4, shree wrote: >> >> If you want to OCR an invoice like the sample you posted, just use the >> eng.traineddata and OCR the page. You do not need to do any training. >> >> Here is the output I get >> >> >> >> 8633 0410 NO RP 11 07122015 NNNNNYNN 01 000001 0001 Page 2 Of 3 >> >> >> Did you know? >> >> >> Your Comcast Business Internet >> >> service gives you access to millions >> >> of WiFi hotspots with the fastest WiFi >> >> and even more coverage. Find out >> >> more at businesscomcast.com/wifi. >> >> >> >> Need help? We’re here for you. >> >> >> 9 Visit business.comcast.com/help >> >> Call 1-800—391 -3000 >> >> A >> >> >> Billing support >> >> Open 6 am-9 pm MTN, Mon through Fri >> >> and 7 am—8 pm Sat >> >> >> Technical support >> >> Open 24 hours, 7 days a week >> >> >> >> Did you know? >> >> >> Never miss a payment with text alerts. >> >> Receive text message reminders when your >> >> bill is ready to pay or past due. Sign up at >> >> business.comcast.com/myaccount. >> >> >> >> Your bill is ready >> >> >> >> >> Please notify us immediately with any >> >> questions regarding charges billed to your >> >> account. Comcast will issue a credit or >> >> refund for any verified billing error which is >> >> brought to our attention within sixty (60) days >> >> of the bill. >> >> >> llllllllllllllllllllllllllllllllll >> >> >> Additional payment options Moving? Let us help. >> >> >> Automatic payment >> >> Sign up at business.comcast.com/myaccount >> >> >> a Oniine >> >> >> Visit business.comcast.com/myaccount >> >> >> a By phone >> >> Call 1-800-391 -3000 >> >> >> if you're moving, give us as much >> >> advanced notice as possible so we >> >> can help make a smooth transition. >> >> >> Call 1 -800-391 -3000 >> >> >> |||||||llllllllllllllllllllllllll >> >> >> >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Fri, Apr 21, 2017 at 11:34 AM, Alain Ghawi <[email protected]> wrote: >> >>> Hello all, >>> >>> I am surprised by how many people tell me that tesseract is the best >>> open-source OCR tool but yet there is no video explaining step-by-step the >>> problems that you can encounter, or a good explanation and documentation >>> for OCR. >>> >>> Well even though, everyone loves challenges! So here's the challenge I >>> faced. I brought many pdf files that are invoices and I want to train >>> tesseract to be able to ocr them as scanned images. >>> So first of all, I transformed these pdf files into tif files >>> using: magick -density 300 -depth 4 2151.pdf -background white -fill >>> white -alpha Off 2151%d.tif >>> This is ImageMagick. Nothing important here other than we have a 300 dpi >>> image with an alpha channel off. >>> >>> You must rename them so : rename .tif files to: >>> [lang].[name_font].exp0.tif (com.test_font.exp0.tif) This is for my example >>> >>> Great! After this step you must create your box file right? So I simply >>> called: >>> tesseract com.test_font.exp0.tif com.test_font.exp0 batch.nochop makebox >>> tesseract com.test_font.exp0.tif com.test_font.exp1 batch.nochop makebox >>> >>> Then I fixed my files with CowBoxEditor as I wasn't finding the famous >>> jTessBoxEditor online (weird right?) which did the job. >>> >>> After that, I created my .tr files: >>> tesseract com.test_font.exp0.tif com.test_font.exp0 nobatch box.train >>> tesseract com.test_font.exp1.tif com.test_font.exp1 nobatch box.train >>> >>> And here comes the surprises!!! >>> After having your .tr files you call unicharset_extractor. >>> First question: Why the glyph metrics are all 0,255,0,255,0,0,0,0,0,0? >>> Which is wrong according to the documentation: >>> https://github.com/tesseract-ocr/tesseract/blob/a3ba11b030345d32829b1e8355afea5419978d82/doc/unicharset.5.asc >>> Second question: Should I write a box file, then the other or combine >>> them? Option 1: unicharset_extractor com.test_font.exp0.box or Option 2: >>> unicharset_extractor com.test_font.exp0.box com.test_font.exp1.box >>> Third question: set_unicharset_extractor why should I use it? It doesn't >>> fix the metrics only specify if Latin or Common! Link: >>> https://github.com/tesseract-ocr/tesseract/issues/318 >>> >>> After all these unanswered questions, I used mftraining and cntraining >>> (no problems). Finally, I renamed my inttemp, normproto, >>> pffmtable, shapetable and I combined them using combine_tessdata com. >>> >>> Final question: If I named com.inttemp1 com.inttemp2 does it work? Same >>> for shapetable, normproto, pffmtable >>> >>> I think these questions are asked more than once by all new users to >>> tesseract. Please if any expert in tesseract can answer these questions it >>> will be a great help for all the community. >>> Kindly find the attached 2 tif files and the boxes generated. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/beb558f3-d52c-4eca-a668-501a9804ffb0%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/beb558f3-d52c-4eca-a668-501a9804ffb0%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/5b9b67fd-1474-48d8-95d9-15b17d295cc2%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/5b9b67fd-1474-48d8-95d9-15b17d295cc2%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUrx%3D%2B9ZeZ6H5X9LyNLBfaKQzS_OnyfRhP8tPvg1T5kLQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

