On Mon, 27 Jan 2014 08:41:09 -0500 (EST)
Max Pyziur wrote:

> In general, calibre's conversion of pdfs is less robust than Amazon's. By 
> "jumbles," I mean that certain documents have things such as footnotes, or 
> endnotes; some have anchors. In Amazon's case, more often,these seem to be 
> handled correctly. With Calibre, the footnotes can be interspersed with 
> the regular text; in that way, the regular text is "jumbled" with the 
> footnotes presenting problems of continuity.

I remember seeing an OCR program once that would accept pdf files,
so it didn't need to recognize characters, but it still applied
all the OCR layout recognition algorithms to try and detect
the "proper" way to treat the document. I suspect calibre is
doing very limited layout analysis (perhaps none).

I seem to remember seeing libreoffice can import PDFs these
days. I wonder if it is any better at layout? If so, you
could import PDF and export HTML from office.
-- 
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org

Reply via email to