The best converter so far is pdftotext from http://www.glyphandcog.com/ who maintain an open source project at http://www.foolabs.com/xpdf/.
It's not a Python library but you can call pdftotext from with Python using os.system(). I used the pdftotext -layout option and that gave the best result. hth. dinesh -------------------------------------------------------------------------------- Message: 4 Date: Tue, 21 Apr 2009 18:37:39 -0400 From: Robert Berman <berma...@cfl.rr.com> Subject: Re: [Tutor] PDF to text conversion To: tutor@python.org Message-ID: <49ee4ab3.4040...@cfl.rr.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed First, thanks to everyone who contributed to this thread. I have a number of possible solutions and a number of paths to pursue to determine which avenue I should take to resolve this remaining issue. I did try the itools library and while everything installed nicely, most of the tests failed so I am not particularly overjoyed with the results. Thank you Dinesh for the vote of sympathy. I do appreciate it. I did use Adobe Reader to convert the history PDF file into a text file and it did seem to do it faithfully. So now I will work out a parsing function to extract my data and send it to a SQLLITE database. I am thrilled both with the number of suggestions I have received from this group and the quality of the suggestions. Thanks again, Robert Berman
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor