I expect you're in for a debug session.
I do not use Go, so here's just a few general tidbits:
- you tested with the tesseract CLI. Excellent! So that proves things can
go well at the core; one major problem area less to worry about.
- next is the gosseract library/layer itself: how does it talk to
tesseract, what does it pass (and what doesn't it), etc.: from a very swift
glance at the code, there's nothing blatantly obviously wrong in their
bindings.cpp, AFAICT. Haven;t looked any further than that.
- my own usage of tesseract as a library has shown me that getting the
parameters right can be a bit of a hassle sometimes; one of the potential
failure modes is not noting that tesseract does not receive the same config
baseline setup as when it ran via CLI: this is where debugging is mandatory.
My first guess would be to make very sure your tesseract config files are
loaded the same way. While that can be a bit harsh to do when you're not
comfortable with running this stuff in a debugger, here's a preparation
step I would definitely look at if I were you:
1. tesseract via your Go code doesn't produce *anything*, while
2. tesseract CLI does deliver text ("No smoking")
which MAY be due to tesseract not finding any text word bounding-boxes when
run via the Go-code route.
I see they (gosseract) present a GetBoundingBoxes API, so I would first try
to run that one to see if I get any boxes at all, and if any, where they
are in the image (i.e.: do I get: (a) no boxes, (b) only get gibberish
boxes only or (c) at least the ones covering "NO" and "SMOKING", or what?
Then try the same for the CLI (IIRC vanilla tesseract has an option to
cough up bboxes only; haven't used that in a while and I'm running a
customized tesseract here, so check code and documentation, don't take me
at my word!)
To see what I was looking at:
https://github.com/otiai10/gosseract/blob/main/tessbridge.cpp#L108
If the bounding boxes don't show up in your Go run, then it smells like a
config/setup bit not making it into the tesseract engine, so it's debugging
the gosseract bindings.cpp interlayer to see what happens, really. Are CLI
and Go code really, really pointing at the same config search paths, for
example?
If the bounding boxes show up and match the set in the CLI, we have a
serious conundrum.
Either way, that's the road I'd travel if walking in your shoes.
(If you can debug-step the tesseract CLI the same way, you can more easily
compare both, perhaps, as the CLI is using the same APIs gosseract is using
(with some differences, but my current bet is those are not relevant).
Also monitor the gosseract/tesseract run for error and warning messages
from tesseract, as well. If it is silent, maybe force it once to barf a
hairball, just so you know the error/warning/info outputs are working.
Whatever you do, my bet is you have some debugging on the road ahead.
Note: I don't do Go, so haven't used gosseract. This would be my general
tactic though, anyway.
Met vriendelijke groeten / Best regards,
Ger Hobbelt
--------------------------------------------------
web: http://www.hobbelt.com/
http://www.hebbut.net/
mail: [email protected]
mobile: +31-6-11 120 978
--------------------------------------------------
On Fri, Oct 31, 2025 at 6:45 PM Harshit Goel <[email protected]>
wrote:
> Hi team
>
> I’m facing an issue where Tesseract OCR works correctly from the CLI, but
> returns an empty string when called programmatically using Go (via
> gosseract).
>
> For this particular image:
> https://pmi-api.ubconnex.ca/files/icons/2025-03/11c6051eec503f52c43f0de382980d31.png,
> the OCR always returns an empty string when running programmatically. Yet
> when I run the exact same image manually using Tesseract from terminal by
> command: *tesseract /tmp/ocr-3678469497.png stdout*
>
> It correctly detects and returns *NO SMOKING*
>
> *Environment*
>
> - OS: Linux (Server)
> -
>
> Tesseract version: tesseract 5.x (CLI works fine)
> -
>
> Go binding: github.com/otiai10/gosseract/v2
> -
>
> Go version: go1.23.x
>
> I've tried with the following approaches but still no effect:
>
> -
>
> Different PSM modes (SPARSE_TEXT, SINGLE_BLOCK, etc.)
> -
>
> Preprocessing (grayscale, contrast enhancement, flattening
> transparency).
> -
>
> Verified that the image file is saved correctly and readable by
> Tesseract.
> -
>
> Tried increasing image size and contrast.
>
> Is there any known discrepancy between the CLI binary and the gosseract
> API in how page segmentation modes or image preprocessing are handled
> internally?
>
> Any insight on why Tesseract detects text in CLI but gosseract binding
> returns empty output would be very helpful.
>
> Best Regards,
>
> Harshit Goel
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion visit
> https://groups.google.com/d/msgid/tesseract-ocr/54875e13-9f91-4f45-9eb8-ee8eec4e5846n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/54875e13-9f91-4f45-9eb8-ee8eec4e5846n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/tesseract-ocr/CAFP60foBhh_8kWyiP9-zVyfO8JrxwgDmvm%3DZH5pnE3sHYiu_1g%40mail.gmail.com.