Hello,
I am not sure if OCRmyPDF(https://ocrmypdf.readthedocs.io/en/latest/) allows redaction. If you would to implement text layer by yourself with custom font, have a look at PyMuPDF: - https://github.com/pymupdf/PyMuPDF/discussions/775 (Adding text layer to a scanned PDF) - https://github.com/pymupdf/PyMuPDF/discussions/2464 (invisible text layer) Zdenko št 7. 3. 2024 o 20:53 Mark Pellegrino <[email protected]> napísal(a): > I found more info here: > > https://github.com/tesseract-ocr/tesseract/issues/1769#issuecomment-509490277 > > Glyphless appears to be an 'invisible font' and all that Tesseract > supports. It seems like the solution it to use Tesseract to generate hOCR, > then use another tool to combine the source image with the hOCR? > > Does anyone have a simple workflow for editing/correcting Tesseract OCR > documents that they can share? > > Thanks again, > > On Thursday 7 March 2024 at 14:17:28 UTC-5 Mark Pellegrino wrote: > >> Hello, >> I'm trying to check PDFs made with Tesseract 5.2 for correctness using an >> OCR editor but am unable to open them in either Abbyy or Acrobat. >> >> If I try to open a Tesseract PDF with Abbyy FineReader/OCR Editor, the >> software just hangs and crashes. I can open Tesseract PDFs with Acrobat >> Pro, but when I enable the 'Make OCR text visible' option in Preflight, >> all of the text layer turns into unreadable black boxes. The font used >> shows as 'GlyphLessFont' and appears to be embedded in the file. >> >> It doesn't matter what training data I use, or what the source image was, >> I always get these results. Any other non-Tesseract made PDF works just >> fine. I'm guessing that the issue is a missing font? I don't have much of >> an understanding about how embedded PDF fonts work and I haven't found >> anything about this in the Tesseract docs. Can someone please point me in >> the right direction? I Thanks. >> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/b43c0ea6-fd81-49af-b74f-e93b0a682574n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/b43c0ea6-fd81-49af-b74f-e93b0a682574n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wqM%2BE7KZ6_etfi6y8B_bLeZE4fRYns-TY3Yp%3DGhFjr7g%40mail.gmail.com.

