Hello,

since numerous free ebooks come only in PDF format I am looking for a method to 
transform them to text or html to make them readable on ebook readers that 
don't support PDF reflow.

While in general ExtractText works sufficiently well for words, that doesn't 
hold for paragraphs. In text mode, ExtractText doesn't distinguish between the 
end of a line and a new paragraph (often indicated by indenting to first line 
of the text block), thus formatting is quite poor for text. The HTML output 
seems to distinguish but still suffers from embedding headers and footers into 
the text.

I don't have an immediate solution, but would prefer if the text output would 
insert a blank line at the places where the HTML output sets a paragraph tag, 
or a tab for the indentation at the beginning of a line. As for headers and 
footers, I could only imagine to set some parameter to ignore text outside of 
the standard type area.

Or are these unreasonably wishes?

Best
Thomas


Reply via email to