I am having issues with coordinates. The PDFTextStripperByArea region
seems to be pushed too high.
Consider the following example snippet:
...
PDPage page = (PDPage) allPages.get(0);
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
// define region for extraction -- the coordinates and dimensions
are x, y, width, height
Rectangle region = new Rectangle((int) x, (int)y, (int)width, (int)height);
stripper.addRegion("test region", region);
// overlay the region with a cyan rectangle to check if I got the
coordinates and dimensions right
PDPageContentStream contentStream = new
PDPageContentStream(document, page, true, true);
contentStream.setNonStrokingColor( Color.CYAN );
contentStream.fillRect( (int)x, (int)y, (int)width, (int)height );
contentStream.close();
// extract the text from the defined region
stripper.extractRegions(page);
String content = stripper.getTextForRegion("test region");
...
document.save(...);
...
The cyan rectangle overlays the desired region nicely. On the other
hand, stripper misses a couple of lines at the bottom of the rectangle
and includes couple of lines above the rectangle. What is going on?
Thank you,
Ilija.