Dmitry, Would using a loss-less format like TIFF be preferred?
(I'm going to give this a try but some of these steps might be a bit more than I can handle...I'm not an image processing guru.) -Dave On Mon, Mar 14, 2011 at 5:23 PM, Dmitry Silaev <daemons2...@gmail.com> wrote: > Ehmm, actually I thought a bit more and now I say no to deskewing. It > can be detrimental to such poor quality images - they are almost > binary ("almost" probably because of the JPEG compression algo) and > low-res. As far as I see, you only can have binary images. > > Therefore we need to assume a skew of an input image to be always > within some narrow range and modify all our following steps to work in > a skewed coordinate system. > > Dmitry > > On Mar 14, 4:19 pm, David Hoffer <dhoff...@gmail.com> wrote: >> Dmity, >> >> That would be great thanks for the offer, I'll attach two samples. >> >> These two are good examples of the range of quality. What I need to >> do is extract cell data for processing. I can generate these in any >> image format, tiff, jpeg if one should be preferred. >> >> Best regards, >> -Dave >> >> On Mon, Mar 14, 2011 at 11:07 AM, Dmitry Silaev <daemons2...@gmail.com> >> wrote: >> > I suspect, this paper is a sledgehammer for a nut. It's quite >> > universal and elaborated. Usually it may take a great deal of time to >> > implement and debug it. Your images might require much simplier >> > methods. >> >> > I always say the same thing: send your sample images and the community >> > will try to help. >> >> > Warm regards, >> > Dmitry Silaev >> >> > On Mon, Mar 14, 2011 at 8:23 AM, David Hoffer <dhoff...@gmail.com> wrote: >> >> Hi Vicky, >> >> >> Can you tell me more about this paper? It looks like this is not a >> >> free document so I can't just read it to see if it would solve the >> >> problem I have. >> >> >> My problem is that I have grey-scale image data (tif/jpg/etc) that >> >> contains text within a table format, i.e. cells on the page. The >> >> documents where originally faxed then converted to PDF so the image >> >> quality varies from poor to good. I don't want the table formatting, >> >> I'm looking for a way to remove the formatting and get to just the >> >> image text, I want to convert that to text using OCR, Tesseract or >> >> otherwise. >> >> >> My programming environment is Java but can shell out to other programs >> >> if I need to. >> >> >> Would the approach in the paper solve this problem space? How >> >> practical is the software solution for a one man effort? >> >> >> Thanks, >> >> -Dave >> >> >> On Sun, Mar 13, 2011 at 10:18 AM, Vicky Budhiraja <vicky.vi...@gmail.com> >> >> wrote: >> >>> Hello, >> >> >>> I used this paper (for pre-processing): >> >>> Parameter-Free Geometric Document Layout Analysis, by Lee, Ryu 2001. IEEE >> >>> Tran. Patt. Analysis and Machine Int. Nov 2001 Volume 23 Issue 11 Pages >> >>> 1240 >> >>> - 1256 >> >> >>> Best Regards, >> >>> Vicky >> >> >>> -----Original Message----- >> >>> From: tesseract-ocr@googlegroups.com >> >>> [mailto:tesseract-ocr@googlegroups.com] >> >>> On Behalf Of Daphne >> >>> Sent: Friday, March 11, 2011 01:15 >> >>> To: tesseract-ocr >> >>> Subject: how to get the character in an image file which is in table >> >>> format. >> >> >>> Hello, >> >> >>> I have a scanned image file which contains table. When I OCR it using >> >>> tessnet it doesn't give the desired output. >> >>> It is not reading the characters in the table. Instead it give some >> >>> numbers. >> >> >>> How to read the character in table format image >> >> >>> -- >> >>> You received this message because you are subscribed to the Google Groups >> >>> "tesseract-ocr" group. >> >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >> >>> To unsubscribe from this group, send email to >> >>> tesseract-ocr+unsubscr...@googlegroups.com. >> >>> For more options, visit this group at >> >>>http://groups.google.com/group/tesseract-ocr?hl=en. >> >> >>> -- >> >>> You received this message because you are subscribed to the Google >> >>> Groups "tesseract-ocr" group. >> >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >> >>> To unsubscribe from this group, send email to >> >>> tesseract-ocr+unsubscr...@googlegroups.com. >> >>> For more options, visit this group >> >>> athttp://groups.google.com/group/tesseract-ocr?hl=en. >> >> >> -- >> >> You received this message because you are subscribed to the Google Groups >> >> "tesseract-ocr" group. >> >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> >> To unsubscribe from this group, send email to >> >> tesseract-ocr+unsubscr...@googlegroups.com. >> >> For more options, visit this group >> >> athttp://groups.google.com/group/tesseract-ocr?hl=en. >> >> >> >> hud1.jpeg >> 748KViewDownload >> >> hud2.jpeg >> 2046KViewDownload > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com. > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.