I'm using the method here to remove text from a document: http://www.docjar.com/html/api/org/apache/pdfbox/examples/util/RemoveAllText.java.html
And then rendering the page to an image. I'd like to do exactly as I'm doing, except leave certain pieces of text if they match a regex pattern (i'm looking for sequences of dashes). For this part of the parsing, I'd like to implement a method that checks the textual representations of the prevToken, and only removes it if it doesn't match my string. Are there any helper methods to get the text here given an element like this (possibly in pdf text stripper or otherwise)? Or do i have to manually parse the text? for (Object token : tokens) { if (token instanceof Operator) { Operator op = (Operator) token; if (op.getName().equals("TJ") || op.getName().equals("Tj")) { //remove the one argument to this operator Object prevToken = newTokens.get(newTokens.size() - 1); if(!matchesMyString(prevToken)) { newTokens.remove(newTokens.size() - 1); } continue; } } newTokens.add(token); } Thanks Nick