Re: Building tesseract 3.02.02 with leptonica 1.69

2013-04-29 Thread Nick White
Oh cool, I haven't actually used multi-page TIFFs before, it's nice that Tesseract handles them well, straight from ghostscript. Yes, at the moment I suppose you'll just have to make a little script or something to wrap the ghostscript and tesseract steps appropriately. I have used pdfimages for

Re: Building tesseract 3.02.02 with leptonica 1.69

2013-04-29 Thread Steven McArdle
Thanks Nick I already have it set up for ghostscript as it gives better results than imagemagick. As the PDF's are mostly multi-page and ghostscript can generate multi-page TIFF from these, I can feed these directly into Tesseract. So I don't think pdfimages is an option as it spits out multip

Re: Building tesseract 3.02.02 with leptonica 1.69

2013-04-29 Thread TP
On Mon, Apr 29, 2013 at 4:10 AM, Steven McArdle wrote: > What do you mean by "it doesn't support straight PDF" ? > > Leptonica only supports PDF for relatively simple *output*. See "I/O libraries Leptonica is dependent on" [1] and "Image I/O" [2]. If you don't believe that, see src\environ.h [3] f

Re: Building tesseract 3.02.02 with leptonica 1.69

2013-04-29 Thread Nick White
On Mon, Apr 29, 2013 at 04:10:43AM -0700, Steven McArdle wrote: > What do you mean by "it doesn't support straight PDF" ? I mean it only accepts image files. So you need to extract the images from the PDF before getting Tesseract to process them. Now I think of it, the 'pdfimages' tool is better

Re: Building tesseract 3.02.02 with leptonica 1.69

2013-04-29 Thread Steven McArdle
What do you mean by "it doesn't support straight PDF" ? The PDF I have is a pure image PDF i.e. from a scanner with NO OCR, just the image layer. I can convert this to TIFF with good results using Ghostscript but I was hoping that Tesseract could handle image only PDF's Steve On Monday, Ap

Re: Building tesseract 3.02.02 with leptonica 1.69

2013-04-29 Thread Nick White
> ALSO, I thought tesseract built with leptonica could handle any of the formats > leptonica can handle, and that include PDF. Nope, it doesn't support straight PDF. Best is to rip the images out of the PDF first. If you have imagemagick, something like this will do that: convert my-test.pdf ou

Building tesseract 3.02.02 with leptonica 1.69

2013-04-29 Thread Steven McArdle
Hi All I have built Tesseract 3.02.02 with Leptonica 1.69 but I have some problems running tesseract --version reports tesseract 3.02.02 Notice it does not mention leptonica ? Secondly, if I try to use a PDF as input I get the following error $ tesseract my-test.pdf my-test Tesseract Open So