I'm not one of the developers, and I don't do anything for windows anymore
(this issue is one of them). It sounds like, reading through your link,
that there isn't anything actually malicious, just one of the certificates
has expired (or, is near expiring and your system clock is off in some way
-
Ger,
Your problem set/end goal is simular to mine (textbooks/manuals not
magazines and datasheets and I only have tiff or jpg images, no partial
pdfs, but full text search and copy/paste are things I want, and
textbooks/manuals do have the same OCR difficulties as magazines).
Can't offer much
Novels and non-fiction prose (memiors, basic history or whatever) I'm
getting good runs, they also happen to use fonts that were, or are close to
ones, already trained. Manuals and textbooks - most of the ones I'm trying
to work with include pictures and diagrams and other elements to further
Hello Ger, and thank you for responding.
Regarding training and/or tuning - I definitely don't have the available
computing power for a full train, and assuming I'm understanding the
requirements (specifically the 1000 images minimum thing) I'm not sure I
have enough data for a tune (it's
Image quality matters. Upside down or sideways images really need to be
rotated first - that is easy to do without loading up an image editor, just
need to get into the jpg's metadata.
It sounds like you are processing text books, to turn into something a
screenreader can manage? Headers and such
"Regarding proofreading with Scribe OCR, it is definitely possible to zoom
in. The controls are virtually identical to popular document viewer
programs like Acrobat. You can zoom in on the current location of the mouse
using Control + Mouse Wheel, scroll using the mouse wheel, and pan in all
Forgive me, I have lots of questions and will be trying to separate out one
question per conversation (so that those searching later may more easily
find the answers).
I'm working with scanned images of a textbook like layout - occasional
drop-caps, text in 2 or occasionally 3 columns that
e: if I were you I'ld want to see both processes'
> performance and decide what to do after that.
>
> Postprocessing is akin to "fixing it in the mix": you only do that when
> all other options have been depleted.
>
>
> On Sun, 24 Mar 2024, 19:29 Misti Hamon, wrote:
&g
The text imagename, outputbase, lang etc are all placeholders (and anything
in square brackets are optional, you don't need to include them if the
defaults will give you the output you need). To run tesseract, you'll need
to replace the placeholder text with the specifics of your file.
On Sat,
Scanned books?
No help on training or choosing datasets, but, if these images are
photoscanned book pages, did you run the images through book specific
processing software (scantailor, spreads, or bookscan wizard are the 3 I
know of, plus internet archive's scan tool scripts) to split your source
I'm going to preface this with, I haven't actually done an OCR run yet (by
the time any replies come in, I probably will have, the source image
editing is almost done).
I'm working with some photoscanned images of knitting related work (so,
there are some non-word characters and acronyms used,
11 matches
Mail list logo