Dmitry,

Would using a loss-less format like TIFF be preferred?

(I'm going to give this a try but some of these steps might be a bit
more than I can handle...I'm not an image processing guru.)

-Dave

On Mon, Mar 14, 2011 at 5:23 PM, Dmitry Silaev <daemons2...@gmail.com> wrote:
> Ehmm, actually I thought a bit more and now I say no to deskewing. It
> can be detrimental to such poor quality images - they are almost
> binary ("almost" probably because of the JPEG compression algo) and
> low-res. As far as I see, you only can have binary images.
>
> Therefore we need to assume a skew of an input image to be always
> within some narrow range and modify all our following steps to work in
> a skewed coordinate system.
>
> Dmitry
>
> On Mar 14, 4:19 pm, David Hoffer <dhoff...@gmail.com> wrote:
>> Dmity,
>>
>> That would be great thanks for the offer, I'll attach two samples.
>>
>> These two are good examples of the range of quality.  What I need to
>> do is extract cell data for processing.  I can generate these in any
>> image format, tiff, jpeg if one should be preferred.
>>
>> Best regards,
>> -Dave
>>
>> On Mon, Mar 14, 2011 at 11:07 AM, Dmitry Silaev <daemons2...@gmail.com> 
>> wrote:
>> > I suspect, this paper is a sledgehammer for a nut. It's quite
>> > universal and elaborated. Usually it may take a great deal of time to
>> > implement and debug it. Your images might require much simplier
>> > methods.
>>
>> > I always say the same thing: send your sample images and the community
>> > will try to help.
>>
>> > Warm regards,
>> > Dmitry Silaev
>>
>> > On Mon, Mar 14, 2011 at 8:23 AM, David Hoffer <dhoff...@gmail.com> wrote:
>> >> Hi Vicky,
>>
>> >> Can you tell me more about this paper?  It looks like this is not a
>> >> free document so I can't just read it to see if it would solve the
>> >> problem I have.
>>
>> >> My problem is that I have grey-scale image data (tif/jpg/etc) that
>> >> contains text within a table format, i.e. cells on the page.  The
>> >> documents where originally faxed then converted to PDF so the image
>> >> quality varies from poor to good.  I don't want the table formatting,
>> >> I'm looking for a way to remove the formatting and get to just the
>> >> image text, I want to convert that to text using OCR, Tesseract or
>> >> otherwise.
>>
>> >> My programming environment is Java but can shell out to other programs
>> >> if I need to.
>>
>> >> Would the approach in the paper solve this problem space?  How
>> >> practical is the software solution for a one man effort?
>>
>> >> Thanks,
>> >> -Dave
>>
>> >> On Sun, Mar 13, 2011 at 10:18 AM, Vicky Budhiraja <vicky.vi...@gmail.com> 
>> >> wrote:
>> >>> Hello,
>>
>> >>> I used this paper (for pre-processing):
>> >>> Parameter-Free Geometric Document Layout Analysis, by Lee, Ryu 2001. IEEE
>> >>> Tran. Patt. Analysis and Machine Int. Nov 2001 Volume 23 Issue 11 Pages 
>> >>> 1240
>> >>> - 1256
>>
>> >>> Best Regards,
>> >>> Vicky
>>
>> >>> -----Original Message-----
>> >>> From: tesseract-ocr@googlegroups.com 
>> >>> [mailto:tesseract-ocr@googlegroups.com]
>> >>> On Behalf Of Daphne
>> >>> Sent: Friday, March 11, 2011 01:15
>> >>> To: tesseract-ocr
>> >>> Subject: how to get the character in an image file which is in table 
>> >>> format.
>>
>> >>> Hello,
>>
>> >>> I have a scanned image file which contains table. When I OCR it using
>> >>> tessnet it doesn't give the desired output.
>> >>> It is not reading the characters in the table. Instead it give some
>> >>> numbers.
>>
>> >>> How to read the character in table format image
>>
>> >>> --
>> >>> You received this message because you are subscribed to the Google Groups
>> >>> "tesseract-ocr" group.
>> >>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> >>> To unsubscribe from this group, send email to
>> >>> tesseract-ocr+unsubscr...@googlegroups.com.
>> >>> For more options, visit this group at
>> >>>http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>> >>> --
>> >>> You received this message because you are subscribed to the Google 
>> >>> Groups "tesseract-ocr" group.
>> >>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> >>> To unsubscribe from this group, send email to 
>> >>> tesseract-ocr+unsubscr...@googlegroups.com.
>> >>> For more options, visit this group 
>> >>> athttp://groups.google.com/group/tesseract-ocr?hl=en.
>>
>> >> --
>> >> You received this message because you are subscribed to the Google Groups 
>> >> "tesseract-ocr" group.
>> >> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> >> To unsubscribe from this group, send email to 
>> >> tesseract-ocr+unsubscr...@googlegroups.com.
>> >> For more options, visit this group 
>> >> athttp://groups.google.com/group/tesseract-ocr?hl=en.
>>
>>
>>
>>  hud1.jpeg
>> 748KViewDownload
>>
>>  hud2.jpeg
>> 2046KViewDownload
>
> --
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to 
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to