[Tutor] extracting text from word files (.doc, .docx) and pdf

2011-01-25 Thread Juan Jose Del Toro
Dear List; I am looking for a way to extract parts of a text from word (.doc,.docx) files as well as pdf; the idea is to walk through the whole directory tree and populate a csv file with an excerpt from each file. For PDF I found PyPdf http://pybrary.net/pyPdf/ave found nothing to read doc, docx

Re: [Tutor] extracting text from word files (.doc, .docx) and pdf

2011-01-25 Thread Walter Prins
On 25 January 2011 21:52, Juan Jose Del Toro jdeltoro1...@gmail.com wrote: Dear List; I am looking for a way to extract parts of a text from word (.doc,.docx) files as well as pdf; the idea is to walk through the whole directory tree and populate a csv file with an excerpt from each file.

Re: [Tutor] extracting text from word files (.doc, .docx) and pdf

2011-01-25 Thread Corey Richardson
On 01/25/2011 04:52 PM, Juan Jose Del Toro wrote: Dear List; I am looking for a way to extract parts of a text from word (.doc,.docx) files as well as pdf; the idea is to walk through the whole directory tree and populate a csv file with an excerpt from each file. For PDF I found PyPdf

Re: [Tutor] extracting text from word files (.doc, .docx) and pdf

2011-01-25 Thread Emile van Sebille
On 1/25/2011 1:52 PM Juan Jose Del Toro said... Dear List; I am looking for a way to extract parts of a text from word (.doc,.docx) I recently did a project extracting data from word documents and used antiword (http://www.winfield.demon.nl/) then used it like this: def setContent(self):

Re: [Tutor] extracting text from word files (.doc, .docx) and pdf

2011-01-25 Thread Alan Gauld
Juan Jose Del Toro jdeltoro1...@gmail.com wrote I am looking for a way to extract parts of a text from word (.doc,.docx) files as well as pdf; In addition to the suggestions already given you can use COM to drive Word itself if you have Word on the PC in which you are running the code. If