Fulltext is a simple Python library for converting document and media
files to text. It's main purpose is for use with full-text indexing
systems.
https://github.com/btimby/fulltext
http://pypi.python.org/pypi/fulltext/0.1-1
For example, to easily extract text from a PDF file:
> python
> import
Python-libarchive is a wrapper around the excellent libarchive
library. It allows you to work with many different archive formats
using a single API.
http://code.google.com/p/python-libarchive/
Python-libarchive is a SWIG wrapper around the library as well as some
high-level Python classes to mak