Hello, since numerous free ebooks come only in PDF format I am looking for a method to transform them to text or html to make them readable on ebook readers that don't support PDF reflow.
While in general ExtractText works sufficiently well for words, that doesn't hold for paragraphs. In text mode, ExtractText doesn't distinguish between the end of a line and a new paragraph (often indicated by indenting to first line of the text block), thus formatting is quite poor for text. The HTML output seems to distinguish but still suffers from embedding headers and footers into the text. I don't have an immediate solution, but would prefer if the text output would insert a blank line at the places where the HTML output sets a paragraph tag, or a tab for the indentation at the beginning of a line. As for headers and footers, I could only imagine to set some parameter to ignore text outside of the standard type area. Or are these unreasonably wishes? Best Thomas

