As I can see, your source data can be deemed as 1-bit (binary)
losslessly compressed image. So a lossless conversion to any image
format (makes no difference which) will do no harm.

Warm regards,
Dmitry Silaev





On Tue, Mar 15, 2011 at 8:31 AM, David Hoffer <dhoff...@gmail.com> wrote:
> Dmitry,
>
> Originally the documents are PDF with these images CCITTFax encoded I
> decoded them using iText.  At this point I have a BufferedImage which
> I can save in any format supported by Java.  I assume Tiff would be
> one of the best.
>
> Best regards,
> -Dave
>
> On Tue, Mar 15, 2011 at 7:52 AM, Dmitry Silaev <daemons2...@gmail.com> wrote:
>> Dave,
>>
>> What is the format and resolution in which you initially get your
>> images? For such poor quality every conversion makes an image even
>> worse...
>>
>> Warm regards,
>> Dmitry Silaev
>>
>>
>>
>>
>>
>> On Mon, Mar 14, 2011 at 5:29 PM, David Hoffer <dhoff...@gmail.com> wrote:
>>> Dmitry,
>>>
>>> Would using a loss-less format like TIFF be preferred?
>>>
>>> (I'm going to give this a try but some of these steps might be a bit
>>> more than I can handle...I'm not an image processing guru.)
>>>
>>> -Dave
>>>
>>> On Mon, Mar 14, 2011 at 5:23 PM, Dmitry Silaev <daemons2...@gmail.com> 
>>> wrote:
>>>> Ehmm, actually I thought a bit more and now I say no to deskewing. It
>>>> can be detrimental to such poor quality images - they are almost
>>>> binary ("almost" probably because of the JPEG compression algo) and
>>>> low-res. As far as I see, you only can have binary images.
>>>>
>>>> Therefore we need to assume a skew of an input image to be always
>>>> within some narrow range and modify all our following steps to work in
>>>> a skewed coordinate system.
>>>>
>>>> Dmitry
>>>>
>>>> On Mar 14, 4:19 pm, David Hoffer <dhoff...@gmail.com> wrote:
>>>>> Dmity,
>>>>>
>>>>> That would be great thanks for the offer, I'll attach two samples.
>>>>>
>>>>> These two are good examples of the range of quality.  What I need to
>>>>> do is extract cell data for processing.  I can generate these in any
>>>>> image format, tiff, jpeg if one should be preferred.
>>>>>
>>>>> Best regards,
>>>>> -Dave
>>>>>
>>>>> On Mon, Mar 14, 2011 at 11:07 AM, Dmitry Silaev <daemons2...@gmail.com> 
>>>>> wrote:
>>>>> > I suspect, this paper is a sledgehammer for a nut. It's quite
>>>>> > universal and elaborated. Usually it may take a great deal of time to
>>>>> > implement and debug it. Your images might require much simplier
>>>>> > methods.
>>>>>
>>>>> > I always say the same thing: send your sample images and the community
>>>>> > will try to help.
>>>>>
>>>>> > Warm regards,
>>>>> > Dmitry Silaev
>>>>>
>>>>> > On Mon, Mar 14, 2011 at 8:23 AM, David Hoffer <dhoff...@gmail.com> 
>>>>> > wrote:
>>>>> >> Hi Vicky,
>>>>>
>>>>> >> Can you tell me more about this paper?  It looks like this is not a
>>>>> >> free document so I can't just read it to see if it would solve the
>>>>> >> problem I have.
>>>>>
>>>>> >> My problem is that I have grey-scale image data (tif/jpg/etc) that
>>>>> >> contains text within a table format, i.e. cells on the page.  The
>>>>> >> documents where originally faxed then converted to PDF so the image
>>>>> >> quality varies from poor to good.  I don't want the table formatting,
>>>>> >> I'm looking for a way to remove the formatting and get to just the
>>>>> >> image text, I want to convert that to text using OCR, Tesseract or
>>>>> >> otherwise.
>>>>>
>>>>> >> My programming environment is Java but can shell out to other programs
>>>>> >> if I need to.
>>>>>
>>>>> >> Would the approach in the paper solve this problem space?  How
>>>>> >> practical is the software solution for a one man effort?
>>>>>
>>>>> >> Thanks,
>>>>> >> -Dave
>>>>>
>>>>> >> On Sun, Mar 13, 2011 at 10:18 AM, Vicky Budhiraja 
>>>>> >> <vicky.vi...@gmail.com> wrote:
>>>>> >>> Hello,
>>>>>
>>>>> >>> I used this paper (for pre-processing):
>>>>> >>> Parameter-Free Geometric Document Layout Analysis, by Lee, Ryu 2001. 
>>>>> >>> IEEE
>>>>> >>> Tran. Patt. Analysis and Machine Int. Nov 2001 Volume 23 Issue 11 
>>>>> >>> Pages 1240
>>>>> >>> - 1256
>>>>>
>>>>> >>> Best Regards,
>>>>> >>> Vicky
>>>>>
>>>>> >>> -----Original Message-----
>>>>> >>> From: tesseract-ocr@googlegroups.com 
>>>>> >>> [mailto:tesseract-ocr@googlegroups.com]
>>>>> >>> On Behalf Of Daphne
>>>>> >>> Sent: Friday, March 11, 2011 01:15
>>>>> >>> To: tesseract-ocr
>>>>> >>> Subject: how to get the character in an image file which is in table 
>>>>> >>> format.
>>>>>
>>>>> >>> Hello,
>>>>>
>>>>> >>> I have a scanned image file which contains table. When I OCR it using
>>>>> >>> tessnet it doesn't give the desired output.
>>>>> >>> It is not reading the characters in the table. Instead it give some
>>>>> >>> numbers.
>>>>>
>>>>> >>> How to read the character in table format image
>>>>>
>>>>> >>> --
>>>>> >>> You received this message because you are subscribed to the Google 
>>>>> >>> Groups
>>>>> >>> "tesseract-ocr" group.
>>>>> >>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>>> >>> To unsubscribe from this group, send email to
>>>>> >>> tesseract-ocr+unsubscr...@googlegroups.com.
>>>>> >>> For more options, visit this group at
>>>>> >>>http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>
>>>>> >>> --
>>>>> >>> You received this message because you are subscribed to the Google 
>>>>> >>> Groups "tesseract-ocr" group.
>>>>> >>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>>> >>> To unsubscribe from this group, send email to 
>>>>> >>> tesseract-ocr+unsubscr...@googlegroups.com.
>>>>> >>> For more options, visit this group 
>>>>> >>> athttp://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>
>>>>> >> --
>>>>> >> You received this message because you are subscribed to the Google 
>>>>> >> Groups "tesseract-ocr" group.
>>>>> >> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>>> >> To unsubscribe from this group, send email to 
>>>>> >> tesseract-ocr+unsubscr...@googlegroups.com.
>>>>> >> For more options, visit this group 
>>>>> >> athttp://groups.google.com/group/tesseract-ocr?hl=en.
>>>>>
>>>>>
>>>>>
>>>>>  hud1.jpeg
>>>>> 748KViewDownload
>>>>>
>>>>>  hud2.jpeg
>>>>> 2046KViewDownload
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "tesseract-ocr" group.
>>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>> To unsubscribe from this group, send email to 
>>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>>> For more options, visit this group at 
>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "tesseract-ocr" group.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> To unsubscribe from this group, send email to 
>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>> For more options, visit this group at 
>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>
>>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to