Dinesh,
I have pdftotext version 3.0.0. I have decided to use this to go from
PDF to text. It is not the ideal solution, but is is a certainly doable
solution.
Thank you,
Robert
Dinesh B Vadhia wrote:
The best converter so far is pdftotext from
http://www.glyphandcog.com/ who maintain an open source project at
http://www.foolabs.com/xpdf/.
It's not a Python library but you can call pdftotext from with Python
using os.system(). I used the pdftotext -layout option and that gave
the best result. hth.
dinesh
------------------------------------------------------------------------
Message: 4
Date: Tue, 21 Apr 2009 18:37:39 -0400
From: Robert Berman <berma...@cfl.rr.com <mailto:berma...@cfl.rr.com>>
Subject: Re: [Tutor] PDF to text conversion
To: tutor@python.org <mailto:tutor@python.org>
Message-ID: <49ee4ab3.4040...@cfl.rr.com
<mailto:49ee4ab3.4040...@cfl.rr.com>>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
First, thanks to everyone who contributed to this thread. I have a
number of possible solutions and a number of paths to pursue to
determine which avenue I should take to resolve this remaining issue. I
did try the itools library and while everything installed nicely, most
of the tests failed so I am not particularly overjoyed with the results.
Thank you Dinesh for the vote of sympathy. I do appreciate it.
I did use Adobe Reader to convert the history PDF file into a text file
and it did seem to do it faithfully. So now I will work out a parsing
function to extract my data and send it to a SQLLITE database.
I am thrilled both with the number of suggestions I have received from
this group and the quality of the suggestions.
Thanks again,
Robert Berman
------------------------------------------------------------------------
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor