Dave, Yep, quality is relatively poor so don't expect high accuracy from Tess.
Do you need every table cell's contents? Or getting numbers is just enough and in a next step you can restore [predefined] item names? Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 4:19 PM, David Hoffer <dhoff...@gmail.com> wrote: > Dmity, > > That would be great thanks for the offer, I'll attach two samples. > > These two are good examples of the range of quality. What I need to > do is extract cell data for processing. I can generate these in any > image format, tiff, jpeg if one should be preferred. > > Best regards, > -Dave > > > On Mon, Mar 14, 2011 at 11:07 AM, Dmitry Silaev <daemons2...@gmail.com> wrote: >> I suspect, this paper is a sledgehammer for a nut. It's quite >> universal and elaborated. Usually it may take a great deal of time to >> implement and debug it. Your images might require much simplier >> methods. >> >> I always say the same thing: send your sample images and the community >> will try to help. >> >> Warm regards, >> Dmitry Silaev >> >> >> >> >> >> On Mon, Mar 14, 2011 at 8:23 AM, David Hoffer <dhoff...@gmail.com> wrote: >>> Hi Vicky, >>> >>> Can you tell me more about this paper? It looks like this is not a >>> free document so I can't just read it to see if it would solve the >>> problem I have. >>> >>> My problem is that I have grey-scale image data (tif/jpg/etc) that >>> contains text within a table format, i.e. cells on the page. The >>> documents where originally faxed then converted to PDF so the image >>> quality varies from poor to good. I don't want the table formatting, >>> I'm looking for a way to remove the formatting and get to just the >>> image text, I want to convert that to text using OCR, Tesseract or >>> otherwise. >>> >>> My programming environment is Java but can shell out to other programs >>> if I need to. >>> >>> Would the approach in the paper solve this problem space? How >>> practical is the software solution for a one man effort? >>> >>> Thanks, >>> -Dave >>> >>> >>> >>> On Sun, Mar 13, 2011 at 10:18 AM, Vicky Budhiraja <vicky.vi...@gmail.com> >>> wrote: >>>> Hello, >>>> >>>> I used this paper (for pre-processing): >>>> Parameter-Free Geometric Document Layout Analysis, by Lee, Ryu 2001. IEEE >>>> Tran. Patt. Analysis and Machine Int. Nov 2001 Volume 23 Issue 11 Pages >>>> 1240 >>>> - 1256 >>>> >>>> Best Regards, >>>> Vicky >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: tesseract-ocr@googlegroups.com >>>> [mailto:tesseract-ocr@googlegroups.com] >>>> On Behalf Of Daphne >>>> Sent: Friday, March 11, 2011 01:15 >>>> To: tesseract-ocr >>>> Subject: how to get the character in an image file which is in table >>>> format. >>>> >>>> Hello, >>>> >>>> I have a scanned image file which contains table. When I OCR it using >>>> tessnet it doesn't give the desired output. >>>> It is not reading the characters in the table. Instead it give some >>>> numbers. >>>> >>>> How to read the character in table format image >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "tesseract-ocr" group. >>>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>>> To unsubscribe from this group, send email to >>>> tesseract-ocr+unsubscr...@googlegroups.com. >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "tesseract-ocr" group. >>>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>>> To unsubscribe from this group, send email to >>>> tesseract-ocr+unsubscr...@googlegroups.com. >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en. >>>> >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>> To unsubscribe from this group, send email to >>> tesseract-ocr+unsubscr...@googlegroups.com. >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en. >>> >>> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.