Dave,

What is the format and resolution in which you initially get your
images? For such poor quality every conversion makes an image even
worse...

Warm regards,
Dmitry Silaev





On Mon, Mar 14, 2011 at 5:29 PM, David Hoffer <dhoff...@gmail.com> wrote:
> Dmitry,
>
> Would using a loss-less format like TIFF be preferred?
>
> (I'm going to give this a try but some of these steps might be a bit
> more than I can handle...I'm not an image processing guru.)
>
> -Dave
>
> On Mon, Mar 14, 2011 at 5:23 PM, Dmitry Silaev <daemons2...@gmail.com> wrote:
>> Ehmm, actually I thought a bit more and now I say no to deskewing. It
>> can be detrimental to such poor quality images - they are almost
>> binary ("almost" probably because of the JPEG compression algo) and
>> low-res. As far as I see, you only can have binary images.
>>
>> Therefore we need to assume a skew of an input image to be always
>> within some narrow range and modify all our following steps to work in
>> a skewed coordinate system.
>>
>> Dmitry
>>
>> On Mar 14, 4:19 pm, David Hoffer <dhoff...@gmail.com> wrote:
>>> Dmity,
>>>
>>> That would be great thanks for the offer, I'll attach two samples.
>>>
>>> These two are good examples of the range of quality.  What I need to
>>> do is extract cell data for processing.  I can generate these in any
>>> image format, tiff, jpeg if one should be preferred.
>>>
>>> Best regards,
>>> -Dave
>>>
>>> On Mon, Mar 14, 2011 at 11:07 AM, Dmitry Silaev <daemons2...@gmail.com> 
>>> wrote:
>>> > I suspect, this paper is a sledgehammer for a nut. It's quite
>>> > universal and elaborated. Usually it may take a great deal of time to
>>> > implement and debug it. Your images might require much simplier
>>> > methods.
>>>
>>> > I always say the same thing: send your sample images and the community
>>> > will try to help.
>>>
>>> > Warm regards,
>>> > Dmitry Silaev
>>>
>>> > On Mon, Mar 14, 2011 at 8:23 AM, David Hoffer <dhoff...@gmail.com> wrote:
>>> >> Hi Vicky,
>>>
>>> >> Can you tell me more about this paper?  It looks like this is not a
>>> >> free document so I can't just read it to see if it would solve the
>>> >> problem I have.
>>>
>>> >> My problem is that I have grey-scale image data (tif/jpg/etc) that
>>> >> contains text within a table format, i.e. cells on the page.  The
>>> >> documents where originally faxed then converted to PDF so the image
>>> >> quality varies from poor to good.  I don't want the table formatting,
>>> >> I'm looking for a way to remove the formatting and get to just the
>>> >> image text, I want to convert that to text using OCR, Tesseract or
>>> >> otherwise.
>>>
>>> >> My programming environment is Java but can shell out to other programs
>>> >> if I need to.
>>>
>>> >> Would the approach in the paper solve this problem space?  How
>>> >> practical is the software solution for a one man effort?
>>>
>>> >> Thanks,
>>> >> -Dave
>>>
>>> >> On Sun, Mar 13, 2011 at 10:18 AM, Vicky Budhiraja 
>>> >> <vicky.vi...@gmail.com> wrote:
>>> >>> Hello,
>>>
>>> >>> I used this paper (for pre-processing):
>>> >>> Parameter-Free Geometric Document Layout Analysis, by Lee, Ryu 2001. 
>>> >>> IEEE
>>> >>> Tran. Patt. Analysis and Machine Int. Nov 2001 Volume 23 Issue 11 Pages 
>>> >>> 1240
>>> >>> - 1256
>>>
>>> >>> Best Regards,
>>> >>> Vicky
>>>
>>> >>> -----Original Message-----
>>> >>> From: tesseract-ocr@googlegroups.com 
>>> >>> [mailto:tesseract-ocr@googlegroups.com]
>>> >>> On Behalf Of Daphne
>>> >>> Sent: Friday, March 11, 2011 01:15
>>> >>> To: tesseract-ocr
>>> >>> Subject: how to get the character in an image file which is in table 
>>> >>> format.
>>>
>>> >>> Hello,
>>>
>>> >>> I have a scanned image file which contains table. When I OCR it using
>>> >>> tessnet it doesn't give the desired output.
>>> >>> It is not reading the characters in the table. Instead it give some
>>> >>> numbers.
>>>
>>> >>> How to read the character in table format image
>>>
>>> >>> --
>>> >>> You received this message because you are subscribed to the Google 
>>> >>> Groups
>>> >>> "tesseract-ocr" group.
>>> >>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> >>> To unsubscribe from this group, send email to
>>> >>> tesseract-ocr+unsubscr...@googlegroups.com.
>>> >>> For more options, visit this group at
>>> >>>http://groups.google.com/group/tesseract-ocr?hl=en.
>>>
>>> >>> --
>>> >>> You received this message because you are subscribed to the Google 
>>> >>> Groups "tesseract-ocr" group.
>>> >>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> >>> To unsubscribe from this group, send email to 
>>> >>> tesseract-ocr+unsubscr...@googlegroups.com.
>>> >>> For more options, visit this group 
>>> >>> athttp://groups.google.com/group/tesseract-ocr?hl=en.
>>>
>>> >> --
>>> >> You received this message because you are subscribed to the Google 
>>> >> Groups "tesseract-ocr" group.
>>> >> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> >> To unsubscribe from this group, send email to 
>>> >> tesseract-ocr+unsubscr...@googlegroups.com.
>>> >> For more options, visit this group 
>>> >> athttp://groups.google.com/group/tesseract-ocr?hl=en.
>>>
>>>
>>>
>>>  hud1.jpeg
>>> 748KViewDownload
>>>
>>>  hud2.jpeg
>>> 2046KViewDownload
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> To unsubscribe from this group, send email to 
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> For more options, visit this group at 
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to 
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to