As requested - here is some example rows from the csv files:
117.86.68.157 BitTorrent Client Activity 1 5/21/2009 6:56 82.210.106.99 BitTorrent Client Activity 1 5/20/2009 12:39 81.132.134.83 BitTorrent Client Activity 1 5/21/2009 3:14 The rows are: IP, Activity, Count, Date / Time these are typical log files. On Tue, May 26, 2009 at 6:51 PM, Sander Sweers <sander.swe...@gmail.com>wrote: > 2009/5/26 Paras K. <para...@gmail.com>: > > Hello, > > > > I have been working on this script / program all weekend. I emailed this > > address before and got some great help. I hope that I can get that again! > > > > > > First to explain what I need to do: > > > > Have about 6 CSV files that I need to read. Then I need to split based on > a > > range of IP address and if the count number is larger than 75. > > > > I currently merge all the CSV files by using the command line: > > > > C:Reports> copy *.csv merge.csv > > > > Then I run the dos command: for /F "TOKENS=* SKIP=1" %i in ('find "." > > merge.csv ^| find /v "----"') do echo %i>> P2PMerge.csv > > > > From some of my friends they tell me that should remove that last > carriage > > return, which it does, however when it goes through the python script it > > returns no values. > > Why would you need to strip off a carriage return? And why would you > not process the csv files one after another? It would be easier to > have some example data. > > > Now if I open the merge.csv and remove that carriage return manually and > > save it as P2PMerge.csv the script runs just fine. > > > > Here is my source code: > > > > # P2P Report / Bitorrent Report > > # Version 1.0 > > # Last Updated: May 26, 2009 > > # This script is designed to go through the cvs files and find the valid > IP > > Address > > # Then copys them all to a new file > > import sys > > import win32api > > import win32ui > > import shutil > > import string > > import os > > import os.path > > import csv > > You import csv but do not use it below? > > > #Global Variables > > P2Pfiles = [] > > totalcount = 0 > > t = 0 > > #still in the development process -- where to get the files from > > #right now the location is C:\P2P > > def getp2preportdestion(): > > win32ui.MessageBox('Welcome to P2P Reporting.\nThis program is > designed > > to aid in the P2P reporting. \n\nThe locations of P2P Reports should be > in > > C:\P2P \nWith no subdirectories.\n\nVersion 1.0 - \n\nPress "OK" to > continue > > with this program.') > > p2preport = 'C://P2P\\' > > return p2preport > > > > > > #Main Program > > #Get location of directories > > p2ploc = getp2preportdestion() > > #Checking to make sure directory is there. > > if os.path.exists(p2ploc): > > if os.path.isfile(p2ploc +'/p2pmerge.csv'): > > win32ui.MessageBox('P2PMerge.csv file does exists.\n\nWill > continue > > with P2P Reporting.') > > else: > > win32ui.MessageBox('P2PMerge.csv files does not exists. > \n\nPlease > > run XXXXXXX.bat files first.') > > sys.exit() > > else: > > win32ui.MessageBox('The C:\P2P directory does not exists.\n\nPlease > > create and copy all the files there.\nThen re-run this script') > > sys.exit() > > fh = open('C://P2P/P2PMerge.csv', "rb") > > ff = open('C://P2P/P2PComplete.csv', "wb") > > igot1 = fh.readlines() > > > > for line in igot1: > > You can also write the below and get rid of igot1. > for line in fh.readlines(): > > > readline = line > > ipline = readline > > ctline = readline > > You are making variables to the same object and all are not necessary. > See below idle session which should show what I mean. > > >>> line = [1,2,3,4] > >>> readline = line > >>> ipline = readline > >>> ctline = readline > >>> line > [1, 2, 3, 4] > >>> line.append('This will be copied to readline, iplin and ctline') > >>> readline > [1, 2, 3, 4, 'This will be copied to readline, iplin and ctline'] > >>> ipline > [1, 2, 3, 4, 'This will be copied to readline, iplin and ctline'] > >>> ctline > [1, 2, 3, 4, 'This will be copied to readline, iplin and ctline'] > > > count = ctline.split(',')[2] > > count2 = int(count) > > print count2 > > t = count2 > > Again making variables to the same object? And you really do not not need > t. > > > ip = ipline.split(' ')[0] > > so all the above can be simplified like: > data = line.split(' ') > count = int(data[2]) > ip = data[0] > > > split_ip = ip.split('.') > > if ((split_ip[0] == '192') and (t >=75)): > > The above then would be: > if ip.startswith('192') and count >= 75: > > > ff.write(readline) > This will change as well: > ff.write(line) > > You can figure out the rest ;-) > > > totalcount +=1 > > elif ((split_ip[0] == '151') and (t >=75)): > > ff.write(readline) > > totalcount +=1 > > elif (((split_ip[0] == '142') and (split_ip[1]) == '152') and (t > >=75)): > > ff.write(readline) > > totalcount +=1 > > > > tc = str(totalcount) > > win32ui.MessageBox('Total Number of IPs in P2P Reporting: '+ tc) > > fh.close() > > ff.close() > > > > > > What I am looking for is an working example of how to go through the > > directory and read each csv file within that directory or how to remove > the > > carriage return at the end of the csv file. > > You can avoid the removal of this carriage return, read below. But if > you really need to you can use str.rstrip('carriage return'). > > > NOTE: This is not for a class - it is for work to assist me in reading > > multiple csv files within a couple days. > > > > Any assistance is greatly appreciated. > > Use te glob module which can easilly find all csv files in a > directory. In general I would loop over each file and do your > processing. Like, > > import glob > > totalcount = 0 > for f in glob.glob('inpath' + '*csv'): > for line in f.readlines(): > You code comes here. > > Greets > Sander >
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor