Oh cool, I haven't actually used multi-page TIFFs before, it's nice
that Tesseract handles them well, straight from ghostscript.
Yes, at the moment I suppose you'll just have to make a little
script or something to wrap the ghostscript and tesseract steps
appropriately.
I have used pdfimages for
Thanks Nick
I already have it set up for ghostscript as it gives better results than
imagemagick.
As the PDF's are mostly multi-page and ghostscript can generate multi-page
TIFF from these, I can feed these directly into Tesseract.
So I don't think pdfimages is an option as it spits out multip
On Mon, Apr 29, 2013 at 4:10 AM, Steven McArdle wrote:
> What do you mean by "it doesn't support straight PDF" ?
>
>
Leptonica only supports PDF for relatively simple *output*. See "I/O
libraries Leptonica is dependent on" [1] and "Image I/O" [2]. If you don't
believe that, see src\environ.h [3] f
On Mon, Apr 29, 2013 at 04:10:43AM -0700, Steven McArdle wrote:
> What do you mean by "it doesn't support straight PDF" ?
I mean it only accepts image files. So you need to extract the
images from the PDF before getting Tesseract to process them.
Now I think of it, the 'pdfimages' tool is better
What do you mean by "it doesn't support straight PDF" ?
The PDF I have is a pure image PDF i.e. from a scanner with NO OCR, just
the image layer.
I can convert this to TIFF with good results using Ghostscript but I was
hoping that Tesseract could handle image only PDF's
Steve
On Monday, Ap
> ALSO, I thought tesseract built with leptonica could handle any of the formats
> leptonica can handle, and that include PDF.
Nope, it doesn't support straight PDF. Best is to rip the images
out of the PDF first. If you have imagemagick, something like this
will do that:
convert my-test.pdf ou
Hi All
I have built Tesseract 3.02.02 with Leptonica 1.69 but I have some problems
running tesseract --version reports
tesseract 3.02.02
Notice it does not mention leptonica ?
Secondly, if I try to use a PDF as input I get the following error
$ tesseract my-test.pdf my-test
Tesseract Open So
7 matches
Mail list logo