On Wed, Jan 5, 2011 at 4:45 PM, Emile van Sebille <em...@fenx.com> wrote:

> On 1/5/2011 3:12 PM kanth...@woh.rr.com said...
>
>  I want to use Python to find all "\n" terminated
>> strings in a PDF file, ideally returning string
>> starting addresses.   Anyone willing to help?
>>
>
> pdflines = open(r'c:\shared\python_book_01.pdf').readlines()
> sps = [0]
> for ii in pdflines: sps.append(sps[-1]+len(ii))
>
> Emile
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
Bear in mind that pdf files often have compressed objects in them. If that
is the case, then I would recommend opening the pdf in binary mode and
figuring out how to deflate the correct objects before doing any searching.
PyPDF is a package that might help with this though it could use some
updating.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to