Re: [Tutor] Expanding a Python script to include a zcat and awk pre-process

galaxywatcher Sat, 09 Jan 2010 09:40:02 -0800

I finally got it working! I would do a victory lap around my apartmentbuilding if I wasn't recovering from a broken ankle.

Excuse my excitement, but this simple script marks a new level ofPython proficiency for me. Thanks to Kent, Bob, Denis, and others whopointed me in the right direction.It does quite a few things: decompresses a zipped file or files ifthere is an archive of them, processes a rather ugly csv file (uglybecause it uses a comma as a delimiter, yet there are commas in doublequote separated fields), and it does a simple subtraction of the twocolumns with a little summary to give me the data I need.


#!/usr/bin/env python
import string
import re
import zipfile
highflag = flagcount = sum = sumtotal = 0
z = zipfile.ZipFile('textfile.zip')
for subfile in z.namelist():
    print "Working on filename: " + subfile + "\n"
    data = z.read(subfile)
    pat = re.compile(r"""(\d+), (\d+), (\".+\"|\w+)""")
    for line in data.splitlines():
        result = pat.match(line)
        ranges = result.groups()
        num1 = ranges[0]
        num2 = ranges[1]
        sum = int(num2) - int(num1)
        if sum > 10000000:
            flag1 = " !!!!"
            flagcount += 1
        else:
            flag1 = ""
        if sum > highflag:
            highflag = sum
        print str(num2) + " - " + str(num1) + " = " + str(sum) + flag1
        sumtotal = sumtotal + sum

print "Total ranges = ", sumtotal
print "Total ranges over 10 million: ", flagcount
print "Largest range: ", highflag

A few observations from a Python newbie: The zipfile and gzip modulesshould really be merged together. gzcat on unix reads both compressionformats. It took me way too long to figure out the namelist() method.But I did learn a lot more about how zip actually works as a result.Is there really no way to extract the contents of a single zipped filewithout using a 'for in namelist' construct?

Trying to get split() to extract just two columns from my data was adead end. The re module is the way to go.

I feel like I am in relatively new territory with Python's regexengine. Denis did save me some valuable time with his regex, but myfile had values in the 3rd column that started with alphas as opposedto numerics only, and flipping that (\".+\"|\d+)""") to a (\".+\"|\w+)""") had me gnashing teeth and pulling hair the whole way throughthe regex tutorial. When I finally figured it out, I smack my foreheadand say "of course!". The compile() method of Python's regex engine isnew for me. Makes sense. Just something I have to get used to. I dohave the feeling that Perl's regex is better. But that is another story.

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Expanding a Python script to include a zcat and awk pre-process

Reply via email to