You can also use -

import java.awt.Rectangle;
public String ocrText(File file, String lang, ImageGeometry geometry) {
String resultText = null;
Tesseract instance = getTesseractInstance("TesseractEnvPath", "eng");
// define an equal or smaller region of interest on the image. Follow:
// x-scale, y-scale, width and height
Rectangle rect = new Rectangle(geometry.getXscale(), geometry.getYscale(),
geometry.getWidth(),
geometry.getHeight());

try {
resultText = instance.doOCR(ImageIO.read(file), rect);
log.debug("resultText: {}", resultText);
} catch (TesseractException | IOException e) {
e.printStackTrace();
}

return resultText;
}

On Fri, Jun 29, 2018 at 12:41 PM Dattatraya Tembare <datta.temb...@gmail.com>
wrote:

> "C" is missing in the text because tesseract doesn't have enough margin to
> read the text.
> Require proper margin.
>
>
> On Friday, June 29, 2018 at 12:39:06 PM UTC-4, Dattatraya Tembare wrote:
>>
>> Hello Hari,
>> I faced the same problem.
>>
>> When there are 2 different type of fonts, Tesseract doesn't recognize it
>> properly. It recognizes first text and ignores next text if the font size
>> is bigger than first one.
>> I resolved it by cropping the image into 2 pieces. I'm using
>> ImageMagick (java api) to clean and crop the images.
>>
>> And I see you made a command unnecessarily complicated (I have tesseract
>> path set up)
>>
>> C:\EA>tesseract Capture.PNG Capture -l eng
>> Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica
>>
>> C:\EA>tesseract Capture1.PNG Capture1 -l eng
>> Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica
>>
>> Tesseract will return proper text if the text is at center, how I
>> achieved is -- crop, trim added a border
>>
>> Datta
>>
>> On Thu, Jun 28, 2018 at 3:33 PM Hari P <hari.ja...@gmail.com> wrote:
>>
>>> I am using tesseract v4.0 beta 1 and trying to OCR remittance file.
>>> There is one section which has CHECK NO, but tesseract doesn't seem to
>>> recognize it at all.
>>>
>>> I have tried with setting dictionary words and penalties to 1 for non
>>> dictionary words, yet no change.
>>>
>>> tesseract capture.png captureoutput1 --user-words "C:\Program Files
>>> (x86)\Tesseract-OCR\tessdata\eng.user-words" -c load_system_dawg=0 -c
>>> load_freq_dawg=0 -c language_model_penalty_non_dict_word=1 -c
>>> language_model_penalty_non_freq_dict_word=1
>>>
>>> These are the words I have in eng.user-words.
>>>
>>> CHECK NO.
>>> CHECK
>>> NO
>>> check
>>> no
>>>
>>> Any idea how to fix this?
>>>
>>> Thanks,
>>> Hari
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/01ef5e64-3332-4b0f-a0aa-8ab9488083f1%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/01ef5e64-3332-4b0f-a0aa-8ab9488083f1%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> --
>> Best Regards,
>> Dattatraya Tembare
>> +1 914 721 6311
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/a883cbb9-a96c-4744-b29f-7335c99b813c%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/a883cbb9-a96c-4744-b29f-7335c99b813c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Best Regards,
Dattatraya Tembare
+1 914 721 6311

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAHZwW__B7DSVgVu46%2B9ok-UBchArwpWf1Lz0fscXakDFXr4fAA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to