Sriranga and Mike,

Support for uncompressed TIFFs only is not an issue for a long time!
It was only during the period when Tess used a home-brewed TIFF
input/output routines. Now Tesseract does support many TIFF variations
through the use of Leptonica.

Actually I don't use the image handling part of Tesseract, so I'm
rather interested in investigation of Tesseract's errors, not
Leptonica's.

Warm regards,
Dmitri Silaev





On Mon, Mar 28, 2011 at 4:41 PM, Lutz, Michael <ml...@nds.com> wrote:
> Sorry, you were not saying this, I mixed some stuff up when reading up on
> the issue this morning, this was what I was referring to:
>
>
>
> According irfanview, is compressed as - LZW tif file of 300 DPI   What Quan
> says is correct  image is heavily compressed tif one. Tesseract-OCR is
> supported only uncompressed tif file only from my experience.
>
> Sriranga(78yrsold)
>
> Thanks for pointing it out.
>
> Mike
>
>
>
> Von: zdenko podobny [mailto:zde...@gmail.com]
> Gesendet: Montag, 28. März 2011 14:34
> An: Lutz, Michael
> Cc: Dmitri Silaev; tesseract-ocr@googlegroups.com; Richard Genthner
> Betreff: Re: tesseract.exe has stopped working on win2008 r2
>
>
>
>
>
> On Mon, Mar 28, 2011 at 11:54 AM, Lutz, Michael <ml...@nds.com> wrote:
>
> Hi All,
>
> So the image Richard gave us is a compressed TIF file. Since tesseract only
> supports uncompressed TIF images as noticed by Zdenko you will not get any
> results from this image.
>
>
>
> Incorrect:
>
> image support is task of leptonica, so list of supported format can be found
> of leptonica web and source code. I think we really need to distinguish
> this, because with upgrading of leptonica there could be support for
> new format without changing a line in tesseract code.
> I guessed that leptonica has problem with tiff with "lzw compression". When
> I created tiff with "zip compression" it worked (there are also
> other compression algorithms available in tiff: Packbits, G4, G3,...). I
> never said that leptonica (tesseract) support only uncompressed tiff. I am
> sorry if I was not clear about this.
> As TP corrected me: problem is not in LZW compression, but in "Samples per
> Pixel". Leptonica support 1, 3, 4. Input image used (unsupported) 2. To
> "solve" this just open input file in InfranView and save it as tiff with lzw
> compression. It will change "Samples/Pixel" to 1 automatically ;-)
>
>  Zdenko
>
>
>
> I attached the image as an uncompressed TIF file, see uncompressed.zip, this
> image is processed by tesseract without any problems.
> Also attached is a tesseract.zip, which should unpack a
> tesseract.executable, just rename it to tesseract.exe if it went through, it
> is a release static build using Win7 and WinSDK 7.1 if anyone still wants
> it.
>
> Regards,
> Mike
>
> -----Ursprüngliche Nachricht-----
> Von: Dmitri Silaev [mailto:daemons2...@gmail.com]
> Gesendet: Samstag, 26. März 2011 22:04
>
> An: tesseract-ocr@googlegroups.com
>
> Cc: zdenko podobny; Lutz, Michael; Richard Genthner
>
> Betreff: Re: tesseract.exe has stopped working on win2008 r2
>
> Guys, I still can't understand what the error is produced by
> Tesseract. Let's wait for the error screenshot. Or did you understand
> everything already? Richard says he's got an error message...
>
> Warm regards,
> Dmitri Silaev
>
>
>
>
>
> On Sat, Mar 26, 2011 at 5:42 PM, zdenko podobny <zde...@gmail.com> wrote:
>>
>>
>> On Fri, Mar 25, 2011 at 5:40 PM, Lutz, Michael <ml...@nds.com> wrote:
>>>
>>> Hi,
>>>
>>> I just ran your tif file, I get no results, it must have something to do
>>> with the size of the image. If I try to run a portion of tiff something
>>> smaller than 1000x1000 then I get results.
>>>
>>> Can somebody explain why a tif size (2480x3508 @ 8BPP) is not processed?
>>
>> This is not tesseract but leptonica issue (library used for image
>> handling).
>> When I run it on linux I got error message comming from leptonica (1.67 ->
>> I
>> did not try 1.68 on linux yet):
>> Error in pixReadFromTiffStream: spp not in set {1,3,4}
>> Error in pixReadStreamTiff: pix not read
>> Error in pixReadTiff: pix not read
>> On Windows leptonica "release version" library did not show error/warning
>> messages because of compile option "NO_CONSOLE_IO"
>> (see http://code.google.com/p/leptonica/issues/detail?id=42).
>> It looks like leptonica did not support lzw compression for tiff (
>> see http://www.leptonica.com/source/README.html  "9. Image I/O" - lzw is
>> mentioned in png and gif section, but not with tif). I change
>> tif compression from lzw to zip (BTW: this will cause smaller image),
>> tesseract will produce ouput (on XP SP3).
>> Zdenko
>>
>>> Mike
>>>
>>>
>>>
>>> Von: Richard Genthner [mailto:rich...@guthnur.net]
>>> Gesendet: Freitag, 25. März 2011 17:04
>>> An: Lutz, Michael
>>> Cc: tesseract-ocr@googlegroups.com
>>>
>>> Betreff: Re: tesseract.exe has stopped working on win2008 r2
>>>
>>>
>>>
>>> Here is the screenshot and the tif file. Dmitri if you rename the .exe
>>> that should work. I'm trying to get the traning data up.
>>>
>>> ________________________________
>>> This message is confidential and intended only for the addressee. If you
>>> have received this message in error, please immediately notify the
>>> postmas...@nds.com and delete it from your system as well as any copies.
>>> The
>>> content of e-mails as well as traffic data may be monitored by NDS for
>>> employment and security purposes.
>>> To protect the environment please do not print this e-mail unless
>>> necessary.
>>>
>>> An NDS Group Limited company. www.nds.com
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "tesseract-ocr" group.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>
>
>
> ---------- Forwarded message ----------
> From: "Sriranga(78yrsold)" <withblessi...@gmail.com>
> To: "tesseract-ocr@googlegroups.com" <tesseract-ocr@googlegroups.com>
> Date: Sat, 26 Mar 2011 14:12:41 +0100
> Subject: Re: tesseract.exe has stopped working on win2008 r2
> According irfanview, is compressed as - LZW tif file of 300 DPI   What Quan
> says is correct  image is heavily compressed tif one. Tesseract-OCR is
> supported only uncompressed tif file only from my experience.
>
> On Sat, Mar 26, 2011 at 6:17 PM, Quan Nguyen <nguyen...@gmail.com> wrote:
>>
>> The image appears to have been heavily compressed. OCR the whole image
>> did not yield anything. Doing it blockwise, I got some results but not
>> very accurate:
>>
>> Ch Juhe 24, 2@@9 the ACHP vctect ct: revisect teccmmehdettcns tcr
>> mee_s1es-muhqes-t'ube[[e (NR/H~
>> ‘evictetnce ct tmmuhity’ requtrementstcr heetthcete teefschheh‘. The
>> Heatthcate thtecttctn Ochtrct
>> Ptectices Aciviscry Ccmrmttee (HHCPAG) has ernctcfsed these changes.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to