Bonsoir Tom,
je suppose que tu es francophone :)
Merci pour ta réponse !
Je vais continuer en anglais pour une meilleure compréhension pour les 
autres personnes :)
So,
Thanks a lot for your reply.
Indeed, Okular is a pdf reader.
I agree most of the things. However, I think that things already goes wrong 
in gscan2pdf OCR. As you wrote "I suspect that the text is being split into 
multiple text blocks and that each of those text blocks is getting a new 
line added for "free" at the end."
I opened a support request here:
https://sourceforge.net/p/gscan2pdf/support-requests/70/
Hope I'll get help :)
In conclusion, for the moment the most important thing is I can do text 
search in many scanned document thanks to Tesseract.
Encore merci pour ton aide,
Pascal

Le jeudi 18 septembre 2025 à 18:33:35 UTC+2, [email protected] a écrit :

Salut Pascal,

I'm glad that you were able to determine that Tesseract is working 
correctly.

On Wednesday, September 17, 2025 at 6:57:45 AM UTC-4 [email protected] 
wrote:

Do you have any idea what's going on?


Since you are working with two different applications, which folks in this 
forum are
unlikely to have much knowledge of: 1) gscan2pdf, which uses/embeds 
Tesseract,
and 2) Okular, which I'm guessing is a PDF viewer.

There are a number of areas where things could go awry, including the way 
the PDF
is constructed and the way the text is selected and formatted on the 
clipboard.

I suspect that the text is being split into multiple text blocks and that 
each of those
text blocks is getting a new line added for "free" at the end. Where in the 
processing
chain this is happening isn't clear.

If your goal is simply to get the best rendition of the text, it sounds 
like you've 
discovered what is needed. If you want to get that specific combination of 
programs to work better, you're probably going to need to address it with
whoever supports them.

Bonne chance!

Tom
 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/b8a27f82-7f88-4833-8e8e-aa2a3564edeen%40googlegroups.com.

Reply via email to