Sorry, can't help further. Like I said before: this reads as having to run
this in a debugger and see what happens.

What DOES jump into the eye are those very odd (HUGE) b-box coordinate
numbers: what you would expect to be X/y pixel coordinates of the original
image and /nobody/ has images with over a billion pixels in the horizontal
axis! All those 4 numbers are suspect, which leads me to suspect the binary
API interface between go and c++ is possibly broken. No certainty but this
smells pretty bad.

For reference and to aid your debugging efforts, go and see what tesseract
cli outputs re X/y coordinates in hocr of tav output modes. The bbox
numbers should fall in the same price range, so to speak. ;-)


Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
        http://www.hebbut.net/
mail:   [email protected]
mobile: +31-6-11 120 978
--------------------------------------------------

On Mon, 3 Nov 2025, 14:47 Harshit Goel, <[email protected]> wrote:

> Hi Ger,
>
> Thanks a lot for the detailed guidance — it was really helpful.
>
> I ran deeper diagnostics and confirmed a few things:
>
>    -
>
>    Running *Tesseract CLI* directly works perfectly and extracts: *NO
>    SMOKING*
>    -
>
>    However, when using *gosseract* from Go, I still get *empty text
>    output* and a single empty bounding box like:
>    Text: [ ],  Box: (1476397136,32579)-(1476956064,32579)
>
>    The image being processed is a valid 8-bit/16-bit PNG (confirmed via file 
> command).
>
>    -
>
>    Setting *TESSDATA_PREFIX *or
>    *SetTessdataPrefix("/usr/share/tessdata")*  works correctly — no
>    language load errors.
>    -
>
>    Even after forcing engine mode with *tessedit_ocr_engine_mode = 1 *(LSTM
>    only) and using *PSM_SPARSE_TEXT*, gosseract still returns empty text.
>    -
>
>    This makes me think gosseract is initializing Tesseract differently
>    (maybe not loading the same configs or missing something in the setup
>    phase), because the CLI and Go layer are using the same image and tessdata.
>
> Do you have any suggestions for checking whether gosseract is properly
> initializing TessBaseAPI with the same defaults as CLI?
>
> Thanks again for your help — your earlier hint about checking bounding
> boxes and configuration alignment was spot on.
>
> Best regards,
> Harshit
>
> On Sun, Nov 2, 2025 at 4:09 AM Ger Hobbelt <[email protected]> wrote:
>
>> I expect you're in for a debug session.
>>
>> I do not use Go, so here's just a few general tidbits:
>>
>> - you tested with the tesseract CLI. Excellent! So that proves things can
>> go well at the core; one major problem area less to worry about.
>> - next is the gosseract library/layer itself: how does it talk to
>> tesseract, what does it pass (and what doesn't it), etc.: from a very swift
>> glance at the code, there's nothing blatantly obviously wrong in their
>> bindings.cpp, AFAICT. Haven;t looked any further than that.
>> - my own usage of tesseract as a library has shown me that getting the
>> parameters right can be a bit of a hassle sometimes; one of the potential
>> failure modes is not noting that tesseract does not receive the same config
>> baseline setup as when it ran via CLI: this is where debugging is mandatory.
>>
>> My first guess would be to make very sure your tesseract config files are
>> loaded the same way. While that can be a bit harsh to do when you're not
>> comfortable with running this stuff in a debugger, here's a preparation
>> step I would definitely look at if I were you:
>> 1. tesseract via your Go code doesn't produce *anything*, while
>> 2. tesseract CLI does deliver text ("No smoking")
>> which MAY be due to tesseract not finding any text word bounding-boxes
>> when run via the Go-code route.
>>
>> I see they (gosseract) present a GetBoundingBoxes API, so I would first
>> try to run that one to see if I get any boxes at all, and if any, where
>> they are in the image (i.e.: do I get: (a) no boxes, (b) only get gibberish
>> boxes only or (c) at least the ones covering "NO" and "SMOKING", or what?
>> Then try the same for the CLI (IIRC vanilla tesseract has an option to
>> cough up bboxes only; haven't used that in a while and I'm running a
>> customized tesseract here, so check code and documentation, don't take me
>> at my word!)
>>
>> To see what I was looking at:
>> https://github.com/otiai10/gosseract/blob/main/tessbridge.cpp#L108
>>
>> If the bounding boxes don't show up in your Go run, then it smells like a
>> config/setup bit not making it into the tesseract engine, so it's debugging
>> the gosseract bindings.cpp interlayer to see what happens, really. Are CLI
>> and Go code really, really pointing at the same config search paths, for
>> example?
>> If the bounding boxes show up and match the set in the CLI, we have a
>> serious conundrum.
>>
>> Either way, that's the road I'd travel if walking in your shoes.
>> (If you can debug-step the tesseract CLI the same way, you can more
>> easily compare both, perhaps, as the CLI is using the same APIs gosseract
>> is using (with some differences, but my current bet is those are not
>> relevant).
>>
>> Also monitor the gosseract/tesseract run for error and warning messages
>> from tesseract, as well. If it is silent, maybe force it once to barf a
>> hairball, just so you know the error/warning/info outputs are working.
>> Whatever you do, my bet is you have some debugging on the road ahead.
>>
>> Note: I don't do Go, so haven't used gosseract. This would be my general
>> tactic though, anyway.
>>
>>
>> Met vriendelijke groeten / Best regards,
>>
>> Ger Hobbelt
>>
>> --------------------------------------------------
>> web:    http://www.hobbelt.com/
>>         http://www.hebbut.net/
>> mail:   [email protected]
>> mobile: +31-6-11 120 978
>> --------------------------------------------------
>>
>>
>> On Fri, Oct 31, 2025 at 6:45 PM Harshit Goel <[email protected]>
>> wrote:
>>
>>> Hi team
>>>
>>> I’m facing an issue where Tesseract OCR works correctly from the CLI,
>>> but returns an empty string when called programmatically using Go (via
>>> gosseract).
>>>
>>> For this particular image:
>>> https://pmi-api.ubconnex.ca/files/icons/2025-03/11c6051eec503f52c43f0de382980d31.png,
>>> the OCR always returns an empty string when running programmatically. Yet
>>> when I run the exact same image manually using Tesseract from terminal by
>>> command: *tesseract /tmp/ocr-3678469497.png stdout*
>>>
>>> It correctly detects and returns *NO SMOKING*
>>>
>>> *Environment*
>>>
>>>    - OS: Linux (Server)
>>>    -
>>>
>>>    Tesseract version: tesseract 5.x (CLI works fine)
>>>    -
>>>
>>>    Go binding: github.com/otiai10/gosseract/v2
>>>    -
>>>
>>>    Go version: go1.23.x
>>>
>>> I've tried with the following approaches but still no effect:
>>>
>>>    -
>>>
>>>    Different PSM modes (SPARSE_TEXT, SINGLE_BLOCK, etc.)
>>>    -
>>>
>>>    Preprocessing (grayscale, contrast enhancement, flattening
>>>    transparency).
>>>    -
>>>
>>>    Verified that the image file is saved correctly and readable by
>>>    Tesseract.
>>>    -
>>>
>>>    Tried increasing image size and contrast.
>>>
>>> Is there any known discrepancy between the CLI binary and the gosseract
>>> API in how page segmentation modes or image preprocessing are handled
>>> internally?
>>>
>>> Any insight on why Tesseract detects text in CLI but gosseract binding
>>> returns empty output would be very helpful.
>>>
>>> Best Regards,
>>>
>>> Harshit Goel
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/54875e13-9f91-4f45-9eb8-ee8eec4e5846n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/54875e13-9f91-4f45-9eb8-ee8eec4e5846n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAFP60foBhh_8kWyiP9-zVyfO8JrxwgDmvm%3DZH5pnE3sHYiu_1g%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAFP60foBhh_8kWyiP9-zVyfO8JrxwgDmvm%3DZH5pnE3sHYiu_1g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion visit
> https://groups.google.com/d/msgid/tesseract-ocr/CADRW4UeJiWeZa6aO%2BS2pZoqG1zkMX0q18Rg0efCk7irb5u6Zsw%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CADRW4UeJiWeZa6aO%2BS2pZoqG1zkMX0q18Rg0efCk7irb5u6Zsw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAFP60foPnpL2S-N_9bMSmnZjr9qJtKy5%3DJMXZ_--Jwx2XmnqOA%40mail.gmail.com.

Reply via email to