Sriranga and Mike, Support for uncompressed TIFFs only is not an issue for a long time! It was only during the period when Tess used a home-brewed TIFF input/output routines. Now Tesseract does support many TIFF variations through the use of Leptonica.
Actually I don't use the image handling part of Tesseract, so I'm rather interested in investigation of Tesseract's errors, not Leptonica's. Warm regards, Dmitri Silaev On Mon, Mar 28, 2011 at 4:41 PM, Lutz, Michael <ml...@nds.com> wrote: > Sorry, you were not saying this, I mixed some stuff up when reading up on > the issue this morning, this was what I was referring to: > > > > According irfanview, is compressed as - LZW tif file of 300 DPI What Quan > says is correct image is heavily compressed tif one. Tesseract-OCR is > supported only uncompressed tif file only from my experience. > > Sriranga(78yrsold) > > Thanks for pointing it out. > > Mike > > > > Von: zdenko podobny [mailto:zde...@gmail.com] > Gesendet: Montag, 28. März 2011 14:34 > An: Lutz, Michael > Cc: Dmitri Silaev; tesseract-ocr@googlegroups.com; Richard Genthner > Betreff: Re: tesseract.exe has stopped working on win2008 r2 > > > > > > On Mon, Mar 28, 2011 at 11:54 AM, Lutz, Michael <ml...@nds.com> wrote: > > Hi All, > > So the image Richard gave us is a compressed TIF file. Since tesseract only > supports uncompressed TIF images as noticed by Zdenko you will not get any > results from this image. > > > > Incorrect: > > image support is task of leptonica, so list of supported format can be found > of leptonica web and source code. I think we really need to distinguish > this, because with upgrading of leptonica there could be support for > new format without changing a line in tesseract code. > I guessed that leptonica has problem with tiff with "lzw compression". When > I created tiff with "zip compression" it worked (there are also > other compression algorithms available in tiff: Packbits, G4, G3,...). I > never said that leptonica (tesseract) support only uncompressed tiff. I am > sorry if I was not clear about this. > As TP corrected me: problem is not in LZW compression, but in "Samples per > Pixel". Leptonica support 1, 3, 4. Input image used (unsupported) 2. To > "solve" this just open input file in InfranView and save it as tiff with lzw > compression. It will change "Samples/Pixel" to 1 automatically ;-) > > Zdenko > > > > I attached the image as an uncompressed TIF file, see uncompressed.zip, this > image is processed by tesseract without any problems. > Also attached is a tesseract.zip, which should unpack a > tesseract.executable, just rename it to tesseract.exe if it went through, it > is a release static build using Win7 and WinSDK 7.1 if anyone still wants > it. > > Regards, > Mike > > -----Ursprüngliche Nachricht----- > Von: Dmitri Silaev [mailto:daemons2...@gmail.com] > Gesendet: Samstag, 26. März 2011 22:04 > > An: tesseract-ocr@googlegroups.com > > Cc: zdenko podobny; Lutz, Michael; Richard Genthner > > Betreff: Re: tesseract.exe has stopped working on win2008 r2 > > Guys, I still can't understand what the error is produced by > Tesseract. Let's wait for the error screenshot. Or did you understand > everything already? Richard says he's got an error message... > > Warm regards, > Dmitri Silaev > > > > > > On Sat, Mar 26, 2011 at 5:42 PM, zdenko podobny <zde...@gmail.com> wrote: >> >> >> On Fri, Mar 25, 2011 at 5:40 PM, Lutz, Michael <ml...@nds.com> wrote: >>> >>> Hi, >>> >>> I just ran your tif file, I get no results, it must have something to do >>> with the size of the image. If I try to run a portion of tiff something >>> smaller than 1000x1000 then I get results. >>> >>> Can somebody explain why a tif size (2480x3508 @ 8BPP) is not processed? >> >> This is not tesseract but leptonica issue (library used for image >> handling). >> When I run it on linux I got error message comming from leptonica (1.67 -> >> I >> did not try 1.68 on linux yet): >> Error in pixReadFromTiffStream: spp not in set {1,3,4} >> Error in pixReadStreamTiff: pix not read >> Error in pixReadTiff: pix not read >> On Windows leptonica "release version" library did not show error/warning >> messages because of compile option "NO_CONSOLE_IO" >> (see http://code.google.com/p/leptonica/issues/detail?id=42). >> It looks like leptonica did not support lzw compression for tiff ( >> see http://www.leptonica.com/source/README.html "9. Image I/O" - lzw is >> mentioned in png and gif section, but not with tif). I change >> tif compression from lzw to zip (BTW: this will cause smaller image), >> tesseract will produce ouput (on XP SP3). >> Zdenko >> >>> Mike >>> >>> >>> >>> Von: Richard Genthner [mailto:rich...@guthnur.net] >>> Gesendet: Freitag, 25. März 2011 17:04 >>> An: Lutz, Michael >>> Cc: tesseract-ocr@googlegroups.com >>> >>> Betreff: Re: tesseract.exe has stopped working on win2008 r2 >>> >>> >>> >>> Here is the screenshot and the tif file. Dmitri if you rename the .exe >>> that should work. I'm trying to get the traning data up. >>> >>> ________________________________ >>> This message is confidential and intended only for the addressee. If you >>> have received this message in error, please immediately notify the >>> postmas...@nds.com and delete it from your system as well as any copies. >>> The >>> content of e-mails as well as traffic data may be monitored by NDS for >>> employment and security purposes. >>> To protect the environment please do not print this e-mail unless >>> necessary. >>> >>> An NDS Group Limited company. www.nds.com >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>> To unsubscribe from this group, send email to >>> tesseract-ocr+unsubscr...@googlegroups.com. >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> To unsubscribe from this group, send email to >> tesseract-ocr+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> > > > > ---------- Forwarded message ---------- > From: "Sriranga(78yrsold)" <withblessi...@gmail.com> > To: "tesseract-ocr@googlegroups.com" <tesseract-ocr@googlegroups.com> > Date: Sat, 26 Mar 2011 14:12:41 +0100 > Subject: Re: tesseract.exe has stopped working on win2008 r2 > According irfanview, is compressed as - LZW tif file of 300 DPI What Quan > says is correct image is heavily compressed tif one. Tesseract-OCR is > supported only uncompressed tif file only from my experience. > > On Sat, Mar 26, 2011 at 6:17 PM, Quan Nguyen <nguyen...@gmail.com> wrote: >> >> The image appears to have been heavily compressed. OCR the whole image >> did not yield anything. Doing it blockwise, I got some results but not >> very accurate: >> >> Ch Juhe 24, 2@@9 the ACHP vctect ct: revisect teccmmehdettcns tcr >> mee_s1es-muhqes-t'ube[[e (NR/H~ >> ‘evictetnce ct tmmuhity’ requtrementstcr heetthcete teefschheh‘. The >> Heatthcate thtecttctn Ochtrct >> Ptectices Aciviscry Ccmrmttee (HHCPAG) has ernctcfsed these changes. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> To unsubscribe from this group, send email to >> tesseract-ocr+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com. > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.