Hello Karl,
PS: at this moment, I have 24 document bloqued. 20 status «Processing » and 4 status « About to Process ». So, I have test and they are they sames. So, I have import the file and used tika-app.jar to test in local and I have this error for they files: WARN Invalid XObject Subtype: null WARN Invalid XObject Subtype: null WARN Invalid XObject Subtype: null … WARN Invalid XObject Subtype: null WARN Invalid XObject Subtype: null WARN Invalid XObject Subtype: null WARN Invalid XObject Subtype: null Exception in thread "main" java.lang.StackOverflowError at java.util.zip.Inflater.<init>(Inflater.java:102) at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:99) at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74) at org.apache.pdfbox.filter.Filter.decode(Filter.java:87) at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:77) at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:175) at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:163) at org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject.getContents(PDFormXObject.java:144) at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:91) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:493) at org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup(PDFStreamEngine.java:238) at org.apache.pdfbox.contentstream.PDFStreamEngine.showTransparencyGroup(PDFStreamEngine.java:163) at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:60) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:503) at org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup(PDFStreamEngine.java:238) at org.apache.pdfbox.contentstream.PDFStreamEngine.showTransparencyGroup(PDFStreamEngine.java:163) at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:60) … at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:503) at org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup(PDFStreamEngine.java:238) at org.apache.pdfbox.contentstream.PDFStreamEngine.showTransparencyGroup(PDFStreamEngine.java:163) at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:60) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:503) at org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup(PDFStreamEngine.java:238) at org.apache.pdfbox.contentstream.PDFStreamEngine.showTransparencyGroup(PDFStreamEngine.java:163) at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:60) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:503) at org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup(PDFStreamEngine.java:238) at org.apache.pdfbox.contentstream.PDFStreamEngine.showTransparencyGroup(PDFStreamEngine.java:163) at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:60) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:503) at org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup(PDFStreamEngine.java:238) at org.apache.pdfbox.contentstream.PDFStreamEngine.showTransparencyGroup(PDFStreamEngine.java:163) at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:60) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:503) at org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup(PDFStreamEngine.java:238) at org.apache.pdfbox.contentstream.PDFStreamEngine.showTransparencyGroup(PDFStreamEngine.java:163) at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:60) If I open the file with « Edge », it’s good. Any idea? Thanks, Maxence, De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : lundi 28 mai 2018 18:47 À : user@manifoldcf.apache.org Objet : Re: org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838) error SPAM 10Go/hour This sounds potentially like a problem in Tika, but in order to be sure I would need a complete stack trace, not just a piece of one. If it is a Tika issue, it should appear reliably on the same document, again and again. Is there any way you can crawl ONLY one of the documents that got blocked? I suspect that when you paused and restarted, you just postponed the problem and it will happen again. Karl On Mon, May 28, 2018 at 9:50 AM msaunier <msaun...@citya.com <mailto:msaun...@citya.com> > wrote: Hello Karl, In Manifoldcf 2.9 for all jobs at the end of the job, several documents, around twenty, remain blocked. A single error appears and it spam the logs of several gigabytes in a short time which filled the servers : [?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup(PDFStreamEngine.java:231) ~[?:?] If I paused the job and start, documents are send and it working. But, if I’m not there, we have problems. Do you now this problem and do you have a solution ? It’s a bad configuration ? Thanks you.