Hi,

Am 08.02.2012 14:47, schrieb Ilija Pavlic:
I created a document with just one line of green text
(RGB=[146,208,80]), and wrote this small example:

PDDocument doc = null;
try {
     doc = PDDocument.load("C:/Path/To/Pdf/Sample.pdf");
     PDFStreamEngine engine = new
PDFStreamEngine(ResourceLoader.loadProperties("org/apache/pdfbox/resources/PageDrawer.properties"));
     PDPage page = (PDPage)doc.getDocumentCatalog().getAllPages().get(0);
     engine.processStream(page, page.findResources(),
page.getContents().getStream());
     PDGraphicsState graphicState = engine.getGraphicsState();
     
System.out.println(graphicState.getStrokingColor().getColorSpace().getName());
     float colorSpaceValues[] =
graphicState.getStrokingColor().getColorSpaceValue();
     for (float c : colorSpaceValues) {
         System.out.println(c * 255);
     }
}
finally {
     if (doc != null) {
         doc.close();
     }

That outputs
DeviceRGB
146.115
208.08
80.07

So it seems that I got the text color out of the document. However, I
am not sure I understand the color extraction correctly. Here is how I
see it:

As I understand it, PDFStreamEngine has multiple variables describing
its current state, like graphicsState, textMatrix, textLineMatrix,
etc. When PDFStreamEngine processes a page stream, it sets its state
variables depending on what operators it is processing at the moment.
correct

So when it hits green text, it will change the PDGraphicsState
graphicsState because it will encounter appropriate operators. For CS
it will call org.apache.pdfbox.util.operator.SetStrokingColorSpace as
defined by mapping
CS=org.apache.pdfbox.util.operator.SetStrokingColorSpace in the
.properties file. RG will be mapped to
org.apache.pdfbox.util.operator.SetStrokingRGBColor and so on.
correct

When it goes on, it will change its graphicsState to something else;
the color will be changed black for black text and so on. Pdf
operators are like sequences of instructions for drawing: "pick black
color; go to (x1,y1); draw a rectangle to (x2,y2)"
correct

In this particular case, PDGraphicsState hasn't changed because the
document has just text and the text it has is in just one style. For
something more advanced, I would need to extend PDFStreamEngine (just
like PageDrawer, PDFTextStripper and other classes do) to do something
when color changes.
correct

Is that approximately correct?
Yes, here are some more details you may want to know

- there is a stroking and a non-stroking color, but no text color
- the color usage for texts depends on the text rendering mode (text most likely uses rendering mode 0, which means the non-stroking color is used)
- your code snippet extracts the state values at the end of the process

Thank you,
Ilija.

What exactly are you trying to achieve?

BR
Andreas Lehmkühler

Reply via email to