Wow, thanks again =)
--
http://mail.python.org/mailman/listinfo/python-list
FedericoMoreirawrote:
Hi all,
Im parsing a 4.1GB apache log to have stats about how many times an ip
request something from the server.
The first design of the algorithm was
for line in fileinput.input(sys.argv[1:]):
ip = line.split()[0]
if match_counter.has_key(ip):
Great, 2min 34 secs with the open method =)
but why?
ip, sep, rest = line.partition(' ')
match_counter[ip] += 1
instead of
match_counter[line.strip()[0]] += 1
strip really takes more time than partition?
I'm having the same results with both of them right now.
--
Federico Moreira wrote:
Great, 2min 34 secs with the open method =)
but why?
ip, sep, rest = line.partition(' ')
match_counter[ip] += 1
instead of
match_counter[line.strip()[0]] += 1
strip really takes more time than partition?
I'm having the same results with both of them right
Yep i meant split sorry.
Thanks for the answer!
--
http://mail.python.org/mailman/listinfo/python-list
MRAB goo...@mrabarnett.plus.com writes:
Federico Moreira wrote:
Great, 2min 34 secs with the open method =)
but why?
ip, sep, rest = line.partition(' ')
match_counter[ip] += 1
instead of
match_counter[line.strip()[0]] += 1
strip really takes more time than partition?
I'm
Federico Moreira wrote:
Hi all,
Im parsing a 4.1GB apache log to have stats about how many times an ip
request something from the server.
The first design of the algorithm was
for line in fileinput.input(sys.argv[1:]):
ip = line.split()[0]
if match_counter.has_key(ip):
On Tue, 16 Dec 2008 12:07:14 -0300, Federico Moreira wrote:
Hi all,
Im parsing a 4.1GB apache log to have stats about how many times an ip
request something from the server.
The first design of the algorithm was
for line in fileinput.input(sys.argv[1:]):
ip = line.split()[0]
On Tue, 16 Dec 2008 12:07:14 -0300, Federico Moreira wrote:
Hi all,
Im parsing a 4.1GB apache log to have stats about how many times an ip
request something from the server.
The first design of the algorithm was
for line in fileinput.input(sys.argv[1:]):
ip = line.split()[0]
Lie Ryan wrote:
On Tue, 16 Dec 2008 12:07:14 -0300, Federico Moreira wrote:
Hi all,
Im parsing a 4.1GB apache log to have stats about how many times an ip
request something from the server.
The first design of the algorithm was
for line in fileinput.input(sys.argv[1:]):
ip =
MRAB:
from collections import defaultdict
match_counter = defaultdict(int)
for line in fileinput.input(sys.argv[1:]):
ip = line.split()[0]
match_counter[ip] += 1
This can be a little faster still:
match_counter = defaultdict(int)
for line in fileinput.input(sys.argv[1:]):
ip =
The defaultdict option looks faster than the standard dict (20 secs aprox).
Now i have:
#
import fileinput
import sys
from collections import defaultdict
match_counter = defaultdict(int)
for line in fileinput.input(sys.argv[1:]):
match_counter[line.split()[0]]
Quoth Lie Ryan lie.1...@gmail.com:
On Tue, 16 Dec 2008 12:07:14 -0300, Federico Moreira wrote:
Hi all,
Im parsing a 4.1GB apache log to have stats about how many times an ip
request something from the server.
The first design of the algorithm was
for line in
2008/12/16 rdmur...@bitdance.com
Python 3.0 does not support has_key, it's time to get used to not using it
:)
Good to know
line.split(None, 1)[0] really speeds up the proccess
Thanks again.
--
http://mail.python.org/mailman/listinfo/python-list
Hi all,
Im parsing a 4.1GB apache log to have stats about how many times an ip
request something from the server.
The first design of the algorithm was
for line in fileinput.input(sys.argv[1:]):
ip = line.split()[0]
if match_counter.has_key(ip):
match_counter[ip] += 1
else:
bearophileh...@lycos.com writes:
This can be a little faster still:
match_counter = defaultdict(int)
for line in fileinput.input(sys.argv[1:]):
ip = line.split(None, 1)[0]
match_counter[ip] += 1
Bye,
bearophile
Or maybe (untested):
match_counter = defaultdict(int)
for line in
Arnaud Delobelle arno...@googlemail.com writes:
match_total = dict((key, val()) for key, val in match_counter.iteritems())
Sorry I meant
match_total = dict((key, val.next())
for key, val in match_counter.iteritems())
--
Arnaud
--
17 matches
Mail list logo