I've asked this question a couple of times and I really need help - no one has really given me any type of answer that I can use. I've had answers but they point me in no positive direction.
I am converting pdf files to txt files (of course I lose the formatting), but I get horrible results converting to html and even worse to XML. So what I want to do, is have the program either place a space between superscript exponents, or, place exponents in brackets. Is there anyway for me to access the stream of data after the pdf is read, but before it is converted to a string. If I can find a way to do this then I can figure out how to edit the data to return the txt file I want. I am using the .NET port of pdfBox and I would appreciate some examples (preferably VB or C#) but Java was my first language and I'm sure I can knock the dust off of my knowledge.

