Hello, last year I did some extensive testing with PDF-to-text conversion:
Java:
pdfbox Vers. 7.3 (free) pdfbox Vers. 8.0 (free)
Mac OS X:
Automator (free) Skim (free)
Linux/Unix:
pdftotext (free)
Windows:
a-PDF (commercial) AdvancedPDF (commercial) FreePDF2Word (commercial)
multiplePDF (commercial) PDF converter (commercial) somePDF (free)
veryPDF (commercial)
and found PDFbox by far the most powerful tool, in particular for the
mathematical texts I studied (more because of the variety of characters than
the actual formulas).
I would be very interested in a program that produces a decent HTML output out
of PDF (Google does quite good a job), and thus would like to see any results
that come out of this.
Thanks in advance
Thomas
Am 18.10.2010 um 15:45 schrieb arun segar:
> I think the below link works:
>
> http://pdftohtml.sourceforge.net/
>
> Thanks,
> Arun Segar
>
>
> On Mon, Oct 18, 2010 at 6:58 PM, Kevin Brown <[email protected]> wrote:
>
>> I'm very sorry.... PD4ML does not do PDF to HTML, but only HTML to PDF.
>> Apologies.
>>
>> Note to self: don't post to Internet before coffee!
>>
>>
>>
>> On Mon, Oct 18, 2010 at 9:11 AM, Sven Hartrumpf <[email protected]> wrote:
>>
>>> Mon, 18 Oct 2010 08:32:53 -0400, kb1381 wrote:
>>>> HTML. And we've had good luck testing something called PD4ML (so far).
>>>
>>> Interesting! I can see only the inverse conversion (i.e. HTML to PDF)
>>> on http://www.pd4ml.com/ .
>>>
>>> Do you have a better link for us?
>>>
>>> Thanks.
>>> Sven
>>>
>>
smime.p7s
Description: S/MIME cryptographic signature

