RE: Script to extract text from PDF files

2015-11-08 Thread Dan Strohl
://www.binpress.com/tutorial/manipulating-pdfs-with-python/167 -Original Message- From: Python-list [mailto:python-list-bounces+d.strohl=f5@python.org] On Behalf Of Scott Werner Sent: Friday, November 06, 2015 2:30 PM To: python-list@python.org Subject: Re: Script to extract text from PDF

Re: Script to extract text from PDF files

2015-11-06 Thread Scott Werner
On Tuesday, September 25, 2007 at 1:41:56 PM UTC-4, brad wrote: > I have a very crude Python script that extracts text from some (and I > emphasize some) PDF documents. On many PDF docs, I cannot extract text, > but this is because I'm doing something wrong. The PDF spec is large and > complex

Re: Script to extract text from PDF files

2015-11-05 Thread zbin1986
you can try this free online pdf text extractor http://www.online-code.net/pdf-to-word.html to extract text from pdf free online. -- https://mail.python.org/mailman/listinfo/python-list

Re: Script to extract text from PDF files

2007-09-27 Thread Svenn Are Bjerkem
On Sep 26, 11:50 pm, [EMAIL PROTECTED] wrote: On Sep 26, 4:49 pm, Svenn Are Bjerkem [EMAIL PROTECTED] wrote: I have downloaded this package and installed it and found that the text-extraction is more or less useless. Looking into the code and comparing with the PDF spec show a very early

Re: Script to extract text from PDF files

2007-09-26 Thread byte8bits
On Sep 25, 10:19 pm, Lawrence D'Oliveiro [EMAIL PROTECTED] central.gen.new_zealand wrote: Doesn't work that well... This is inherent in the nature of PDF: it's a page-description language, not a document-interchange language. Each text-drawing command can put a block of text anywhere on the

Re: Script to extract text from PDF files

2007-09-26 Thread brad
David Boddie wrote: There's a little information on that online: http://www.glyphandcog.com/textext.html Thanks, I'll read that. Just because inserting and encoding is well documented doesn't mean that the reverse processes are easy. :-/ Boy, that's an understatement... most of the PDF

Re: Script to extract text from PDF files

2007-09-26 Thread Svenn Are Bjerkem
On Sep 25, 9:18 pm, [EMAIL PROTECTED] wrote: On Sep 25, 3:02 pm, Paul Hankin [EMAIL PROTECTED] wrote: Googling for 'pdf to text python' and following the first link giveshttp://pybrary.net/pyPdf/ Doesn't work that well, I've tried it, you should too... the author even admits this:

Re: Script to extract text from PDF files

2007-09-26 Thread byte8bits
On Sep 26, 4:49 pm, Svenn Are Bjerkem [EMAIL PROTECTED] wrote: I have downloaded this package and installed it and found that the text-extraction is more or less useless. Looking into the code and comparing with the PDF spec show a very early implementation of text extraction. Luckily it is

Re: Script to extract text from PDF files

2007-09-26 Thread David Boddie
On Wed Sep 26 23:50:16 CEST 2007, byte8bits wrote: On Sep 26, 4:49 pm, Svenn Are Bjerkem svenn.bjer... at googlemail.com wrote: I have downloaded this package and installed it and found that the text-extraction is more or less useless. Looking into the code and comparing with the PDF

Re: Script to extract text from PDF files

2007-09-25 Thread Paul Hankin
On Sep 25, 6:41 pm, brad [EMAIL PROTECTED] wrote: I have a very crude Python script that extracts text from some (and I emphasize some) PDF documents. On many PDF docs, I cannot extract text, but this is because I'm doing something wrong. The PDF spec is large and complex and there are various

Re: Script to extract text from PDF files

2007-09-25 Thread byte8bits
On Sep 25, 3:02 pm, Paul Hankin [EMAIL PROTECTED] wrote: Googling for 'pdf to text python' and following the first link giveshttp://pybrary.net/pyPdf/ Doesn't work that well, I've tried it, you should too... the author even admits this: extractText() [#] Locate all text drawing commands,

Re: Script to extract text from PDF files

2007-09-25 Thread Lawrence D'Oliveiro
In message [EMAIL PROTECTED], [EMAIL PROTECTED] wrote: On Sep 25, 3:02 pm, Paul Hankin [EMAIL PROTECTED] wrote: Googling for 'pdf to text python' and following the first link giveshttp://pybrary.net/pyPdf/ Doesn't work that well... This is inherent in the nature of PDF: it's a