Sorry, I understand pdfbox probably won't be able to do this.... but perhaps it can? :)
We use this software from BCL called Jade that allowed you to select a 'zone' on a PDF page and extract it to text in such a way that the spacing and line breaking was preserved. It did (and does!) a better job of this than any other tool we have ever tried. But they no longer make or support it! Just wondering if any of you PDF mavens have found a tool or method for doing this which works really well? It seems impossible to do programmatically unless you know the parameters of the text -- one needs to select it manually. For example, we use this a lot for odd tables.

