Dear List;
I am looking for a way to extract parts of a text from word (.doc,.docx)
files as well as pdf; the idea is to walk through the whole directory tree
and populate a csv file with an excerpt from each file.
For PDF I found PyPdf http://pybrary.net/pyPdf/ave found nothing to read
doc, docx
On 25 January 2011 21:52, Juan Jose Del Toro jdeltoro1...@gmail.com wrote:
Dear List;
I am looking for a way to extract parts of a text from word (.doc,.docx)
files as well as pdf; the idea is to walk through the whole directory tree
and populate a csv file with an excerpt from each file.
On 01/25/2011 04:52 PM, Juan Jose Del Toro wrote:
Dear List;
I am looking for a way to extract parts of a text from word (.doc,.docx)
files as well as pdf; the idea is to walk through the whole directory tree
and populate a csv file with an excerpt from each file.
For PDF I found PyPdf
On 1/25/2011 1:52 PM Juan Jose Del Toro said...
Dear List;
I am looking for a way to extract parts of a text from word (.doc,.docx)
I recently did a project extracting data from word documents and used
antiword (http://www.winfield.demon.nl/) then used it like this:
def setContent(self):
Juan Jose Del Toro jdeltoro1...@gmail.com wrote
I am looking for a way to extract parts of a text from word
(.doc,.docx)
files as well as pdf;
In addition to the suggestions already given you can use
COM to drive Word itself if you have Word on the PC in
which you are running the code.
If