Hi Rob, Yes, fax2tiff could be one way. But actually, I'm extracting the CCITTFaxDecode stream data from the PDFs and trying to extract OCR text from them. So, I am trying to do this conversion all in memory instead of writing to files.
thanks On Friday, September 7, 2012 12:52:12 PM UTC+8, rkomar wrote: > > On Thu, 6 Sep 2012, newtotesseract wrote: > > > Hi Nick, > > I tried passing in the CCITTFaxDecode data to tesseract, > > but it was not detected as TIFF. > > > > It seems like CCITT fax is not same as TIFF. > > > > Google search showed me that few other people also faced > > same issue (e.g." > http://stackoverflow.com/questions/2641770/extracting-im > > age-from-pdf-with-ccittfaxdecode-filter"). > > > > If you know, how we can convert the CCITT-Fax to tiff or > > jpeg, it would be really helpful. > > > > Many thanks for your help and time. > > > > Thanks, > > - ganesh > > TIFF files can contain many kinds of image data compressed > with all sorts of types of compression. CCITT _is_ one > of the supported compression types. If you can install > ImageMagick, then you can use the 'convert' program in that > package to create your TIFF file. For example: > > > convert in.fax -compress Group4 out.tif > > converts the file to TIFF using CCITT Group4 compression > in the output. > > Or, if you have libtiff installed, then you can use > the fax2tiff program to do the conversion. > > Don't convert to jpeg; it isn't meant for bi-level images. > > Cheers, > Rob Komar > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

