Re: [Tutor] how to parse a multiple character words from plaintext

Kent Johnson Sun, 24 Feb 2008 06:05:17 -0800

---- John Gunderman <[EMAIL PROTECTED]> wrote: 
> I am parsing the output of the mork.pl, which is a DORK (the mozilla format) 
> parser. I don't know Perl, so I decided to write a Python script to do what I 
> wanted, which basically is to create a dictionary listing each site and its 
> corresponding values instead of outputting into plaintext. Unfortunately, the 
> output of mork.pl is 5000+ lines so reading the whole document wouldn't be 
> that efficient.


If you have enough memory for it to fit, reading the whole file at once is fine.

> Currently it uses:
>         for line in history_file.readlines():
> but I dont know if this has to read all lines before it goes through it. 

Yes, readlines() reads the entire file.

> if it does, then would it be more efficient to use
>         while line != '/t':
>             line = history_file.readline()    

Probably not. But why so much emphasis on efficiency? Get the program working 
first. Only if it is too slow should you worry about efficiency. Processing a 
5000-line file should not be a problem in Python.

> I was thinking of just appending each character to the string until it sees 
> '/t', and then using int() on the string, but is there an easier way?

It would really help to see a sample of the data and the results you want from 
it. There are many ways to parse data in Python, from simple string operations 
to regular expressions to full-blown parsers. Without knowing what you want to 
do it is impossible to suggest an appropriate method.

Kent
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to parse a multiple character words from plaintext

Reply via email to