hi,
i'm trying to load some sample pdf documents but only 1 of 4 is parsed by
pdfbox without exception.
adobe reader opens all those pdf documents without any sign of problems.
public static void main(String[] args) throws Exception {
InputStream ins=TestGetTexts.class.getResourceAsStream(
"/034352.pdf"); // sample document
PDFParser parser=new PDFParser(ins);
parser.parse();
COSDocument cosDoc=parser.getDocument();
PDDocument pdDoc = new PDDocument(cosDoc);
}
it throws exceptions at line "parser.parse();"
what is wrong with that?
16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 252 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 15:49:49 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 34 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
16.8.2012 15:49:49 org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a
DataFormatException
Exception in thread "main" java.io.IOException
at org.apache.pdfbox.filter.FlateFilter.decode(
FlateFilter.java:138)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(
COSStream.java:156)
at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.<init>(
PDFXrefStreamParser.java:61)
at org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream(
PDFParser.java:846)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
PDFParser.java:574)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
at test.TestGetTexts.main(TestGetTexts.java:20)
Caused by: java.util.zip.DataFormatException: incorrect header check
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:238)
at java.util.zip.Inflater.inflate(Inflater.java:256)
at org.apache.pdfbox.filter.FlateFilter.decompress(
FlateFilter.java:169)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98
)
... 8 more
the other pdf:
16.8.2012 16:08:44 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 4192 is wrong. Fall back to reading
stream until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 576 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 432 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 304 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 480 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 176 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 2096 is wrong. Fall back to reading
stream until 'endstream'.
16.8.2012 16:08:45 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 137440 is wrong. Fall back to reading
stream until 'endstream'.
Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
: Could not push back 137440 bytes in order to reparse stream. Try
increasing push back buffer using system property
org.apache.pdfbox.baseParser.pushBackSize
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
BaseParser.java:546)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
PDFParser.java:566)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
at test.TestGetTexts.main(TestGetTexts.java:20)
Caused by: java.io.IOException: Push back buffer is full
at java.io.PushbackInputStream.unread(PushbackInputStream.java:215
)
at org.apache.pdfbox.io.PushBackInputStream.unread(
PushBackInputStream.java:144)
at org.apache.pdfbox.io.PushBackInputStream.unread(
PushBackInputStream.java:133)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
BaseParser.java:542)
... 3 more
or:
16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 8 is wrong. Fall back to reading stream
until 'endstream'.
16.8.2012 16:10:27 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
WARNING: Specified stream length 77788 is wrong. Fall back to reading
stream until 'endstream'.
Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
: Could not push back 77788 bytes in order to reparse stream. Try
increasing push back buffer using system property
org.apache.pdfbox.baseParser.pushBackSize
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
BaseParser.java:546)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(
PDFParser.java:566)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
at test.TestGetTexts.main(TestGetTexts.java:21)
Caused by: java.io.IOException: Push back buffer is full
at java.io.PushbackInputStream.unread(PushbackInputStream.java:215
)
at org.apache.pdfbox.io.PushBackInputStream.unread(
PushBackInputStream.java:144)
at org.apache.pdfbox.io.PushBackInputStream.unread(
PushBackInputStream.java:133)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
BaseParser.java:542)
... 3 more
best regards
Juraj Lonc