Alan Collins wrote: > Hi, > > I do a far bit of data manipulation and decided to try one of my > favourite utilities in Python. I'd really appreciate some optimization > of the script. I'm sure that I've missed many tricks in even this short > script. > > Let's say you have a file with this data: > > Monday 7373 3663657 2272 547757699 reached 100% > Tuesday 7726347 552 766463 2253 under-achieved 0% > Wednesday 9899898 8488947 6472 77449 reached 100% > Thursday 636648 553 22344 5699 under-achieved 0% > Friday 997 3647757 78736632 357599 over-achieved 200% > > You now want columns 1, 5, and 7 printed and aligned (much like a > spreadsheet). For example: > > Monday 547757699 100% > Wednesday 77449 100% > ... > > This script does the job, but I reckon there are better ways. In the > interests of brevity, I have dropped the command-line argument handling > and hard-coded the columns for the test and I hard-coded the input file > name. > You might like to see how it is done in this recipe: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/267662 > ------------------------------------------------------- > """ > PrintColumns > > Print specified columns, alignment based on data type. > > The script works by parsing the input file twice. The first pass gets > the maximum length of > all values on the columns. This value is used to pad the column on the > second pass. > > """ > import sys > > columns = [0] # hard-code the columns to be printed. > colwidth = [0] # list into which the maximum field lenths will > be stored. > > """ > This part is clunky. Can't think of another way to do it without making > the script > somewhat longer and slower. What it does is that if the user specifies > column 0, all > columns will be printed. This bit builds up the list of columns, from 1 > to 100. > """ > > if columns[0] == 0: > columns = [1] > while len(columns) < 100: > columns.append(len(columns)+1) > columns = range(1, 100) > """ > First pass. Read all lines and determine the maximum width of each > selected column. > """ > infile = file("mylist", "r") > indata = infile.readlines() > for myline in indata: > mycolumns = myline.split() > colindex = 0 > for column in columns: > if column <= len(mycolumns): > if len(colwidth)-1 < colindex: > colwidth.append(len(mycolumns[column-1])) > else: > if colwidth[colindex] < len(mycolumns[column-1]): > colwidth[colindex] = len(mycolumns[column-1]) > colindex += 1 > infile.close() > > """ > Second pass. Read all lines and print the selected columns. Text values > are left > justified, while numeric values are right justified. > """ > infile = file("mylist", "r") > indata = infile.readlines() > No need to read the file again, you still have indata. > for myline in indata: > mycolumns = myline.split() > colindex = 0 > for column in columns: > if column <= len(mycolumns): > if mycolumns[column-1].isdigit(): > x = mycolumns[column-1].rjust(colwidth[colindex]) + ' ' > else: > x = mycolumns[column-1].ljust(colwidth[colindex]+1) > print x, > colindex += 1 > print "" > infile.close() > Hmm...you really should make columns be the correct length. If you use a list comp to make colwidth then you can just make columns the same length as colwidth. Then if you make a helper function for the formatting def format(value, width): if value.isdigit(): return value.rjust(width) + ' ' else: return value.ljust(width)
Now the formatting becomes values = [ format(column[i], colwidth[i] for i in columns ] which you print with print ''.join(values) Kent > ------------------------------------------------------- > > Any help greatly appreciated. > Regards, > Alan. > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > > > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor