Paras K. wrote:
There are no headers for the log files, and there are mulitple log
files so what that walk through the directory for all csv files?
THANK IN ADVANCE!!!
On Wed, May 27, 2009 at 10:18 AM, Christian Witts
<cwi...@compuscan.co.za <mailto:cwi...@compuscan.co.za>> wrote:
Paras K. wrote:
As requested - here is some example rows from the csv files:
117.86.68.157 BitTorrent Client Activity 1
5/21/2009 6:56
82.210.106.99 BitTorrent Client Activity 1
5/20/2009 12:39
81.132.134.83 BitTorrent Client Activity 1
5/21/2009 3:14
The rows are: IP, Activity, Count, Date / Time these are
typical log files.
On Tue, May 26, 2009 at 6:51 PM, Sander Sweers
<sander.swe...@gmail.com <mailto:sander.swe...@gmail.com>
<mailto:sander.swe...@gmail.com
<mailto:sander.swe...@gmail.com>>> wrote:
2009/5/26 Paras K. <para...@gmail.com
<mailto:para...@gmail.com> <mailto:para...@gmail.com
<mailto:para...@gmail.com>>>:
> Hello,
>
> I have been working on this script / program all weekend. I
emailed this
> address before and got some great help. I hope that I can get
that again!
>
>
> First to explain what I need to do:
>
> Have about 6 CSV files that I need to read. Then I need
to split
based on a
> range of IP address and if the count number is larger
than 75.
>
> I currently merge all the CSV files by using the command
line:
>
> C:Reports> copy *.csv merge.csv
>
> Then I run the dos command: for /F "TOKENS=* SKIP=1" %i in
('find "."
> merge.csv ^| find /v "----"') do echo %i>> P2PMerge.csv
>
> From some of my friends they tell me that should remove that
last carriage
> return, which it does, however when it goes through the
python
script it
> returns no values.
Why would you need to strip off a carriage return? And why
would you
not process the csv files one after another? It would be
easier to
have some example data.
> Now if I open the merge.csv and remove that carriage return
manually and
> save it as P2PMerge.csv the script runs just fine.
>
> Here is my source code:
>
> # P2P Report / Bitorrent Report
> # Version 1.0
> # Last Updated: May 26, 2009
> # This script is designed to go through the cvs files and
find
the valid IP
> Address
> # Then copys them all to a new file
> import sys
> import win32api
> import win32ui
> import shutil
> import string
> import os
> import os.path
> import csv
You import csv but do not use it below?
> #Global Variables
> P2Pfiles = []
> totalcount = 0
> t = 0
> #still in the development process -- where to get the
files from
> #right now the location is C:\P2P
> def getp2preportdestion():
> win32ui.MessageBox('Welcome to P2P Reporting.\nThis
program
is designed
> to aid in the P2P reporting. \n\nThe locations of P2P Reports
should be in
> C:\P2P \nWith no subdirectories.\n\nVersion 1.0 -
\n\nPress "OK"
to continue
> with this program.')
> p2preport = 'C://P2P\\'
> return p2preport
>
>
> #Main Program
> #Get location of directories
> p2ploc = getp2preportdestion()
> #Checking to make sure directory is there.
> if os.path.exists(p2ploc):
> if os.path.isfile(p2ploc +'/p2pmerge.csv'):
> win32ui.MessageBox('P2PMerge.csv file does
exists.\n\nWill continue
> with P2P Reporting.')
> else:
> win32ui.MessageBox('P2PMerge.csv files does not
exists.
\n\nPlease
> run XXXXXXX.bat files first.')
> sys.exit()
> else:
> win32ui.MessageBox('The C:\P2P directory does not
exists.\n\nPlease
> create and copy all the files there.\nThen re-run this
script')
> sys.exit()
> fh = open('C://P2P/P2PMerge.csv', "rb")
> ff = open('C://P2P/P2PComplete.csv', "wb")
> igot1 = fh.readlines()
>
> for line in igot1:
You can also write the below and get rid of igot1.
for line in fh.readlines():
> readline = line
> ipline = readline
> ctline = readline
You are making variables to the same object and all are not
necessary.
See below idle session which should show what I mean.
>>> line = [1,2,3,4]
>>> readline = line
>>> ipline = readline
>>> ctline = readline
>>> line
[1, 2, 3, 4]
>>> line.append('This will be copied to readline, iplin and
ctline')
>>> readline
[1, 2, 3, 4, 'This will be copied to readline, iplin and
ctline']
>>> ipline
[1, 2, 3, 4, 'This will be copied to readline, iplin and
ctline']
>>> ctline
[1, 2, 3, 4, 'This will be copied to readline, iplin and
ctline']
> count = ctline.split(',')[2]
> count2 = int(count)
> print count2
> t = count2
Again making variables to the same object? And you really
do not
not need t.
> ip = ipline.split(' ')[0]
so all the above can be simplified like:
data = line.split(' ')
count = int(data[2])
ip = data[0]
> split_ip = ip.split('.')
> if ((split_ip[0] == '192') and (t >=75)):
The above then would be:
if ip.startswith('192') and count >= 75:
> ff.write(readline)
This will change as well:
ff.write(line)
You can figure out the rest ;-)
> totalcount +=1
> elif ((split_ip[0] == '151') and (t >=75)):
> ff.write(readline)
> totalcount +=1
> elif (((split_ip[0] == '142') and (split_ip[1]) == '152')
and (t >=75)):
> ff.write(readline)
> totalcount +=1
>
> tc = str(totalcount)
> win32ui.MessageBox('Total Number of IPs in P2P Reporting:
'+ tc)
> fh.close()
> ff.close()
>
>
> What I am looking for is an working example of how to go
through the
> directory and read each csv file within that directory or
how to
remove the
> carriage return at the end of the csv file.
You can avoid the removal of this carriage return, read
below. But if
you really need to you can use str.rstrip('carriage return').
> NOTE: This is not for a class - it is for work to assist
me in
reading
> multiple csv files within a couple days.
>
> Any assistance is greatly appreciated.
Use te glob module which can easilly find all csv files in a
directory. In general I would loop over each file and do your
processing. Like,
import glob
totalcount = 0
for f in glob.glob('inpath' + '*csv'):
for line in f.readlines():
You code comes here.
Greets
Sander
------------------------------------------------------------------------
_______________________________________________
Tutor maillist - Tutor@python.org <mailto:Tutor@python.org>
http://mail.python.org/mailman/listinfo/tutor
If that's your log structure and it's all IP Addresses and what
you want is to count the amount of P2P activity per IP and for
whatever purpose then what you could do is something similar to this:
from glob import glob
if __name__ == '__main__':
IP_Addresses = dict()
for filename in glob('*.csv'):
fIn = open(filename, 'rb')
for line in fIn:
IP, Activity, Count, TimeDate = line.strip().split(' ')
if IP in IP_Addresses:
IP_Addresses[IP] += int(Count)
else:
IP_Addresses[IP] = int(Count)
for IP, Cnt in IP_Addresses.items():
if Cnt >= 75:
if IP.split('.')[0] in ('192', '151'):
print IP, Cnt
elif IP.split('.')[:2] == ['142', '152']:
print IP, Cnt
Obviously if you want to keep the original log line then you will
need to store that in your dictionary as well, but for the purpose
of reporting how many 'offences' an IP Address has had this is
simple enough.
--
Kind Regards,
Christian Witts
from glob import glob
for filename in glob('/path/to/your/files/*.csv'):
print filename
That will recurse the files in the folder for everything with a .csv
extension which is what you want.
Then for each file that matches the extension type, the application I
wrote in the previous with recurse through each line in the file, split
the contents of the log on spaces, although it looks like tabs in your
sample then just change the .split(' ') with .split('\t') which will
break it up into IP, Activity, Count, DateTime.
It will add the IP Address to a dictionary of IP Addresses if it is not
there with the count of that log and any further from that IP will
increment it by the log count. Once all files have been processed it
will then check the Addresses and check what the count is (you don't
care about ones with less than 75 hits) and then check what range they
are in and output those.
Hope that helps.
--
Kind Regards,
Christian Witts
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor