hi, i'm trying to load some sample pdf documents but only 1 of 4 is parsed by pdfbox without exception. adobe reader opens all those pdf documents without any sign of problems.
public static void main(String[] args) throws Exception { InputStream ins=TestGetTexts.class.getResourceAsStream( "/034352.pdf"); // sample document PDFParser parser=new PDFParser(ins); parser.parse(); COSDocument cosDoc=parser.getDocument(); PDDocument pdDoc = new PDDocument(cosDoc); } it throws exceptions at line "parser.parse();" what is wrong with that? 16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream WARNING: Specified stream length 252 is wrong. Fall back to reading stream until 'endstream'. 16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream WARNING: Specified stream length 34 is wrong. Fall back to reading stream until 'endstream'. 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException 16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Exception in thread "main" java.io.IOException at org.apache.pdfbox.filter.FlateFilter.decode( FlateFilter.java:138) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221) at org.apache.pdfbox.cos.COSStream.getUnfilteredStream( COSStream.java:156) at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.<init>( PDFXrefStreamParser.java:61) at org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream( PDFParser.java:846) at org.apache.pdfbox.pdfparser.PDFParser.parseObject( PDFParser.java:574) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) at test.TestGetTexts.main(TestGetTexts.java:20) Caused by: java.util.zip.DataFormatException: incorrect header check at java.util.zip.Inflater.inflateBytes(Native Method) at java.util.zip.Inflater.inflate(Inflater.java:238) at java.util.zip.Inflater.inflate(Inflater.java:256) at org.apache.pdfbox.filter.FlateFilter.decompress( FlateFilter.java:169) at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98 ) ... 8 more the other pdf: 16.8.2012 16:08:44 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream WARNING: Specified stream length 4192 is wrong. Fall back to reading stream until 'endstream'. 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream WARNING: Specified stream length 576 is wrong. Fall back to reading stream until 'endstream'. 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream WARNING: Specified stream length 432 is wrong. Fall back to reading stream until 'endstream'. 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream WARNING: Specified stream length 304 is wrong. Fall back to reading stream until 'endstream'. 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream WARNING: Specified stream length 480 is wrong. Fall back to reading stream until 'endstream'. 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream WARNING: Specified stream length 176 is wrong. Fall back to reading stream until 'endstream'. 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream WARNING: Specified stream length 2096 is wrong. Fall back to reading stream until 'endstream'. 16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream WARNING: Specified stream length 137440 is wrong. Fall back to reading stream until 'endstream'. Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException : Could not push back 137440 bytes in order to reparse stream. Try increasing push back buffer using system property org.apache.pdfbox.baseParser.pushBackSize at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream( BaseParser.java:546) at org.apache.pdfbox.pdfparser.PDFParser.parseObject( PDFParser.java:566) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) at test.TestGetTexts.main(TestGetTexts.java:20) Caused by: java.io.IOException: Push back buffer is full at java.io.PushbackInputStream.unread(PushbackInputStream.java:215 ) at org.apache.pdfbox.io.PushBackInputStream.unread( PushBackInputStream.java:144) at org.apache.pdfbox.io.PushBackInputStream.unread( PushBackInputStream.java:133) at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream( BaseParser.java:542) ... 3 more or: 16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream WARNING: Specified stream length 8 is wrong. Fall back to reading stream until 'endstream'. 16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream WARNING: Specified stream length 77788 is wrong. Fall back to reading stream until 'endstream'. Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException : Could not push back 77788 bytes in order to reparse stream. Try increasing push back buffer using system property org.apache.pdfbox.baseParser.pushBackSize at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream( BaseParser.java:546) at org.apache.pdfbox.pdfparser.PDFParser.parseObject( PDFParser.java:566) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) at test.TestGetTexts.main(TestGetTexts.java:21) Caused by: java.io.IOException: Push back buffer is full at java.io.PushbackInputStream.unread(PushbackInputStream.java:215 ) at org.apache.pdfbox.io.PushBackInputStream.unread( PushBackInputStream.java:144) at org.apache.pdfbox.io.PushBackInputStream.unread( PushBackInputStream.java:133) at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream( BaseParser.java:542) ... 3 more best regards Juraj Lonc