Hello,

last year I did some extensive testing with PDF-to-text conversion:

Java:
        pdfbox Vers. 7.3 (free) pdfbox Vers. 8.0 (free)
Mac OS X:
        Automator (free) Skim (free)
Linux/Unix: 
        pdftotext (free)
Windows:
        a-PDF (commercial) AdvancedPDF (commercial) FreePDF2Word (commercial)
        multiplePDF (commercial) PDF converter (commercial) somePDF (free) 
veryPDF (commercial)

and found PDFbox by far the most powerful tool, in particular for the 
mathematical texts I studied (more because of the variety of characters than 
the actual formulas).

I would be very interested in a program that produces a decent HTML output out 
of PDF (Google does quite good a job), and thus would like to see any results 
that come out of this.

Thanks in advance
Thomas

Am 18.10.2010 um 15:45 schrieb arun segar:

> I think the below link works:
> 
> http://pdftohtml.sourceforge.net/
> 
> Thanks,
> Arun Segar
> 
> 
> On Mon, Oct 18, 2010 at 6:58 PM, Kevin Brown <[email protected]> wrote:
> 
>> I'm very sorry.... PD4ML does not do PDF to HTML, but only HTML to PDF.
>> Apologies.
>> 
>> Note to self: don't post to Internet before coffee!
>> 
>> 
>> 
>> On Mon, Oct 18, 2010 at 9:11 AM, Sven Hartrumpf <[email protected]> wrote:
>> 
>>> Mon, 18 Oct 2010 08:32:53 -0400, kb1381 wrote:
>>>> HTML. And we've had good luck testing something called PD4ML (so far).
>>> 
>>> Interesting! I can see only the inverse conversion (i.e. HTML to PDF)
>>> on http://www.pd4ml.com/ .
>>> 
>>> Do you have a better link for us?
>>> 
>>> Thanks.
>>> Sven
>>> 
>> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to