There are several aspects to your question that require clarification.
First, the assumption that the logging of data is slowing down your
application is potentially (and likely) untrue. Logging, especially on 64
bit multi core systems, is rarely responsible for slowing down an
applications performance.
Second, where is this appearing? Are you seeing this in the console of
Eclipse when executing the application or is this in the logs? If in the
logs, there would be degradation due to string concatenation etc but this
would rarely be an issue. Depending on the JRE you are using and the JDK
used to compile the application, it can drastically vary. Later versions
of Java would multi-thread this and give precedence to application
functionality vs logging.
>From your timestamps below, it suggests that the analysis of the file
itself is taking the most time (7 seconds). After the first entry, most
of the items are in the same second.
2012-11-18 22:09:52,139 [Consumer-1] DEBUG
com.piifinder.file.FileCreditCardConsumer - Consume Analyzing:
/Users/user/testpiifinder/testdata/pdf/11-12StudentHandbookprintcopy.pdf
Is taking the most time. The next time lag occurs here:
2012-11-18 22:15:45,501 [Consumer-1] DEBUG
org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
PDFOperator{Tf}
2012-11-18 22:15:53,239 [Consumer-1] DEBUG
org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
COSInt{534}
After this, everything is very quick.
The size of the PDF being analyzed is also of importance. How complex is
it? What methodology are you using to parse and process the text? All of
these are relevant.
Posting your course code to see if there is a flag to turn off the verbose
logging would be good too.
If you are using logger4J, you should be able to suppress the verbose
method easily but we would have to know more about your settings.
Our company has lots of practical experience in PDFBox and Adobe LiveCycle
ES. If you can share more details about your application, it should be
able to profile to see where the bottlenecks are. Stripping test from a
PDF would likely be the cause of the slowness depending on how the PDF was
authored. If you have benchmarks you are trying to meet, I would also
love to understand how far off you are (20% over vs 3000%).
Ping me off list if you want to keep the code private.
Sorry to answer your question with lots of other questions. We consult on
things like this, sometimes just for a few pints in a bar.
Cheers!
Duane Nickull (ex-Adobe)
***********************************
Technoracle Advanced Systems Inc.
Consulting and Contracting; Proven Results!
i. Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile
b. http://technoracle.blogspot.com
t. @duanechaos
"Don't fear the Graph! Embrace Neo4J"
On 2012-11-18 7:19 PM, "CDB" <[email protected]> wrote:
>Hello
>
>I have inherited code and incorporated new PDFBox lib pdfbox-1.7.0.jar in
>Eclipse.
>
>When I execute pdfTextStripper.getText(doc);
>I immediately get tons of debug output. See below:
>How can I suppress this? Its making my app run extremely slow.
>
>2012-11-18 22:09:52,139 [Consumer-1] DEBUG
>com.piifinder.file.FileCreditCardConsumer - Consume Analyzing:
>/Users/user/testpiifinder/testdata/pdf/11-12StudentHandbookprintcopy.pdf
>2012-11-18 22:15:45,494 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>PDFOperator{BT}
>2012-11-18 22:15:45,498 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>COSName{F1}
>2012-11-18 22:15:45,500 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>COSFloat{9.96}
>2012-11-18 22:15:45,501 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>PDFOperator{Tf}
>2012-11-18 22:15:53,239 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>COSInt{534}
>2012-11-18 22:15:53,240 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>COSFloat{50.16}
>2012-11-18 22:15:53,240 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>PDFOperator{TD}
>2012-11-18 22:15:53,241 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>COSString{1}
>2012-11-18 22:15:53,242 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>PDFOperator{Tj}
>2012-11-18 22:15:53,248 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>COSInt{6}
>2012-11-18 22:15:53,249 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>COSInt{0}
>2012-11-18 22:15:53,249 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>PDFOperator{TD}
>2012-11-18 22:15:53,250 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>COSString{ }
>2012-11-18 22:15:53,250 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>PDFOperator{Tj}
>2012-11-18 22:15:53,251 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>COSInt{-468}
>2012-11-18 22:15:53,252 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>COSFloat{-11.4}
>2012-11-18 22:15:53,252 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>PDFOperator{TD}
>2012-11-18 22:15:53,253 [Consumer-1] DEBUG
>org.apache.pdfbox.util.PDFStreamEngine - processing substream token:
>COSString{ }
>
>
>WHAT IS THIS?
> AND HOW DO I TURN IT OFF?
>
>