As I can see, your source data can be deemed as 1-bit (binary) losslessly compressed image. So a lossless conversion to any image format (makes no difference which) will do no harm.
Warm regards, Dmitry Silaev On Tue, Mar 15, 2011 at 8:31 AM, David Hoffer <dhoff...@gmail.com> wrote: > Dmitry, > > Originally the documents are PDF with these images CCITTFax encoded I > decoded them using iText. At this point I have a BufferedImage which > I can save in any format supported by Java. I assume Tiff would be > one of the best. > > Best regards, > -Dave > > On Tue, Mar 15, 2011 at 7:52 AM, Dmitry Silaev <daemons2...@gmail.com> wrote: >> Dave, >> >> What is the format and resolution in which you initially get your >> images? For such poor quality every conversion makes an image even >> worse... >> >> Warm regards, >> Dmitry Silaev >> >> >> >> >> >> On Mon, Mar 14, 2011 at 5:29 PM, David Hoffer <dhoff...@gmail.com> wrote: >>> Dmitry, >>> >>> Would using a loss-less format like TIFF be preferred? >>> >>> (I'm going to give this a try but some of these steps might be a bit >>> more than I can handle...I'm not an image processing guru.) >>> >>> -Dave >>> >>> On Mon, Mar 14, 2011 at 5:23 PM, Dmitry Silaev <daemons2...@gmail.com> >>> wrote: >>>> Ehmm, actually I thought a bit more and now I say no to deskewing. It >>>> can be detrimental to such poor quality images - they are almost >>>> binary ("almost" probably because of the JPEG compression algo) and >>>> low-res. As far as I see, you only can have binary images. >>>> >>>> Therefore we need to assume a skew of an input image to be always >>>> within some narrow range and modify all our following steps to work in >>>> a skewed coordinate system. >>>> >>>> Dmitry >>>> >>>> On Mar 14, 4:19 pm, David Hoffer <dhoff...@gmail.com> wrote: >>>>> Dmity, >>>>> >>>>> That would be great thanks for the offer, I'll attach two samples. >>>>> >>>>> These two are good examples of the range of quality. What I need to >>>>> do is extract cell data for processing. I can generate these in any >>>>> image format, tiff, jpeg if one should be preferred. >>>>> >>>>> Best regards, >>>>> -Dave >>>>> >>>>> On Mon, Mar 14, 2011 at 11:07 AM, Dmitry Silaev <daemons2...@gmail.com> >>>>> wrote: >>>>> > I suspect, this paper is a sledgehammer for a nut. It's quite >>>>> > universal and elaborated. Usually it may take a great deal of time to >>>>> > implement and debug it. Your images might require much simplier >>>>> > methods. >>>>> >>>>> > I always say the same thing: send your sample images and the community >>>>> > will try to help. >>>>> >>>>> > Warm regards, >>>>> > Dmitry Silaev >>>>> >>>>> > On Mon, Mar 14, 2011 at 8:23 AM, David Hoffer <dhoff...@gmail.com> >>>>> > wrote: >>>>> >> Hi Vicky, >>>>> >>>>> >> Can you tell me more about this paper? It looks like this is not a >>>>> >> free document so I can't just read it to see if it would solve the >>>>> >> problem I have. >>>>> >>>>> >> My problem is that I have grey-scale image data (tif/jpg/etc) that >>>>> >> contains text within a table format, i.e. cells on the page. The >>>>> >> documents where originally faxed then converted to PDF so the image >>>>> >> quality varies from poor to good. I don't want the table formatting, >>>>> >> I'm looking for a way to remove the formatting and get to just the >>>>> >> image text, I want to convert that to text using OCR, Tesseract or >>>>> >> otherwise. >>>>> >>>>> >> My programming environment is Java but can shell out to other programs >>>>> >> if I need to. >>>>> >>>>> >> Would the approach in the paper solve this problem space? How >>>>> >> practical is the software solution for a one man effort? >>>>> >>>>> >> Thanks, >>>>> >> -Dave >>>>> >>>>> >> On Sun, Mar 13, 2011 at 10:18 AM, Vicky Budhiraja >>>>> >> <vicky.vi...@gmail.com> wrote: >>>>> >>> Hello, >>>>> >>>>> >>> I used this paper (for pre-processing): >>>>> >>> Parameter-Free Geometric Document Layout Analysis, by Lee, Ryu 2001. >>>>> >>> IEEE >>>>> >>> Tran. Patt. Analysis and Machine Int. Nov 2001 Volume 23 Issue 11 >>>>> >>> Pages 1240 >>>>> >>> - 1256 >>>>> >>>>> >>> Best Regards, >>>>> >>> Vicky >>>>> >>>>> >>> -----Original Message----- >>>>> >>> From: tesseract-ocr@googlegroups.com >>>>> >>> [mailto:tesseract-ocr@googlegroups.com] >>>>> >>> On Behalf Of Daphne >>>>> >>> Sent: Friday, March 11, 2011 01:15 >>>>> >>> To: tesseract-ocr >>>>> >>> Subject: how to get the character in an image file which is in table >>>>> >>> format. >>>>> >>>>> >>> Hello, >>>>> >>>>> >>> I have a scanned image file which contains table. When I OCR it using >>>>> >>> tessnet it doesn't give the desired output. >>>>> >>> It is not reading the characters in the table. Instead it give some >>>>> >>> numbers. >>>>> >>>>> >>> How to read the character in table format image >>>>> >>>>> >>> -- >>>>> >>> You received this message because you are subscribed to the Google >>>>> >>> Groups >>>>> >>> "tesseract-ocr" group. >>>>> >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>>>> >>> To unsubscribe from this group, send email to >>>>> >>> tesseract-ocr+unsubscr...@googlegroups.com. >>>>> >>> For more options, visit this group at >>>>> >>>http://groups.google.com/group/tesseract-ocr?hl=en. >>>>> >>>>> >>> -- >>>>> >>> You received this message because you are subscribed to the Google >>>>> >>> Groups "tesseract-ocr" group. >>>>> >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>>>> >>> To unsubscribe from this group, send email to >>>>> >>> tesseract-ocr+unsubscr...@googlegroups.com. >>>>> >>> For more options, visit this group >>>>> >>> athttp://groups.google.com/group/tesseract-ocr?hl=en. >>>>> >>>>> >> -- >>>>> >> You received this message because you are subscribed to the Google >>>>> >> Groups "tesseract-ocr" group. >>>>> >> To post to this group, send email to tesseract-ocr@googlegroups.com. >>>>> >> To unsubscribe from this group, send email to >>>>> >> tesseract-ocr+unsubscr...@googlegroups.com. >>>>> >> For more options, visit this group >>>>> >> athttp://groups.google.com/group/tesseract-ocr?hl=en. >>>>> >>>>> >>>>> >>>>> hud1.jpeg >>>>> 748KViewDownload >>>>> >>>>> hud2.jpeg >>>>> 2046KViewDownload >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "tesseract-ocr" group. >>>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>>> To unsubscribe from this group, send email to >>>> tesseract-ocr+unsubscr...@googlegroups.com. >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en. >>>> >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>> To unsubscribe from this group, send email to >>> tesseract-ocr+unsubscr...@googlegroups.com. >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en. >>> >>> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.