You can also use - import java.awt.Rectangle; public String ocrText(File file, String lang, ImageGeometry geometry) { String resultText = null; Tesseract instance = getTesseractInstance("TesseractEnvPath", "eng"); // define an equal or smaller region of interest on the image. Follow: // x-scale, y-scale, width and height Rectangle rect = new Rectangle(geometry.getXscale(), geometry.getYscale(), geometry.getWidth(), geometry.getHeight());
try { resultText = instance.doOCR(ImageIO.read(file), rect); log.debug("resultText: {}", resultText); } catch (TesseractException | IOException e) { e.printStackTrace(); } return resultText; } On Fri, Jun 29, 2018 at 12:41 PM Dattatraya Tembare <datta.temb...@gmail.com> wrote: > "C" is missing in the text because tesseract doesn't have enough margin to > read the text. > Require proper margin. > > > On Friday, June 29, 2018 at 12:39:06 PM UTC-4, Dattatraya Tembare wrote: >> >> Hello Hari, >> I faced the same problem. >> >> When there are 2 different type of fonts, Tesseract doesn't recognize it >> properly. It recognizes first text and ignores next text if the font size >> is bigger than first one. >> I resolved it by cropping the image into 2 pieces. I'm using >> ImageMagick (java api) to clean and crop the images. >> >> And I see you made a command unnecessarily complicated (I have tesseract >> path set up) >> >> C:\EA>tesseract Capture.PNG Capture -l eng >> Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica >> >> C:\EA>tesseract Capture1.PNG Capture1 -l eng >> Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica >> >> Tesseract will return proper text if the text is at center, how I >> achieved is -- crop, trim added a border >> >> Datta >> >> On Thu, Jun 28, 2018 at 3:33 PM Hari P <hari.ja...@gmail.com> wrote: >> >>> I am using tesseract v4.0 beta 1 and trying to OCR remittance file. >>> There is one section which has CHECK NO, but tesseract doesn't seem to >>> recognize it at all. >>> >>> I have tried with setting dictionary words and penalties to 1 for non >>> dictionary words, yet no change. >>> >>> tesseract capture.png captureoutput1 --user-words "C:\Program Files >>> (x86)\Tesseract-OCR\tessdata\eng.user-words" -c load_system_dawg=0 -c >>> load_freq_dawg=0 -c language_model_penalty_non_dict_word=1 -c >>> language_model_penalty_non_freq_dict_word=1 >>> >>> These are the words I have in eng.user-words. >>> >>> CHECK NO. >>> CHECK >>> NO >>> check >>> no >>> >>> Any idea how to fix this? >>> >>> Thanks, >>> Hari >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-ocr+unsubscr...@googlegroups.com. >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/01ef5e64-3332-4b0f-a0aa-8ab9488083f1%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/01ef5e64-3332-4b0f-a0aa-8ab9488083f1%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> -- >> Best Regards, >> Dattatraya Tembare >> +1 914 721 6311 >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/a883cbb9-a96c-4744-b29f-7335c99b813c%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/a883cbb9-a96c-4744-b29f-7335c99b813c%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Best Regards, Dattatraya Tembare +1 914 721 6311 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAHZwW__B7DSVgVu46%2B9ok-UBchArwpWf1Lz0fscXakDFXr4fAA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.