> > what the default python sorting algorithm is on a list, but AFAIK you'd be
> > looking at a constant O(log 10)
>
> I'm not a mathematician - what does this mean, in layperson's terms?
O(log10) is a way of expressing the efficiency of an algorithm.
Its execution time is proportional (in the
On Mon, Nov 9, 2009 at 3:15 PM, Wayne Werner wrote:
> On Mon, Nov 9, 2009 at 7:46 AM, Stephen Nelson-Smith
> wrote:
>>
>> And the problem I have with the below is that I've discovered that the
>> input logfiles aren't strictly ordered - ie there is variance by a
>> second or so in some of the ent
> I can sort the biggest logfile (800M) using unix sort in about 1.5
> mins on my workstation. That's not really fast enough, with
> potentially 12 other files
You won't beat sort with Python.
You have to be realistic, these are very big files!
Python should be faster overall but for speci
On Mon, Nov 9, 2009 at 7:46 AM, Stephen Nelson-Smith wrote:
> And the problem I have with the below is that I've discovered that the
> input logfiles aren't strictly ordered - ie there is variance by a
> second or so in some of the entries.
>
Within a given set of 10 lines, is the first line and
And the problem I have with the below is that I've discovered that the
input logfiles aren't strictly ordered - ie there is variance by a
second or so in some of the entries.
I can sort the biggest logfile (800M) using unix sort in about 1.5
mins on my workstation. That's not really fast enough,
Hi,
> If you create iterators from the files that yield (timestamp, entry)
> pairs, you can merge the iterators using one of these recipes:
> http://code.activestate.com/recipes/491285/
> http://code.activestate.com/recipes/535160/
Could you show me how I might do that?
So far I'm at the stage o
Stephen Nelson-Smith wrote:
Hi,
Any advice or experiences?
go here and download the pdf!
http://www.dabeaz.com/generators-uk/
Someone posted this the other day, and I went and read through it and played
around a bit and it's exactly what you're looking for - plus it has one vs.
slid
Hi,
>> Any advice or experiences?
>>
>
> go here and download the pdf!
> http://www.dabeaz.com/generators-uk/
> Someone posted this the other day, and I went and read through it and played
> around a bit and it's exactly what you're looking for - plus it has one vs.
> slide of python vs. awk.
> I
On Sun, Nov 8, 2009 at 11:41 PM, Stephen Nelson-Smith wrote:
> I've got a large amount of data in the form of 3 apache and 3 varnish
> logfiles from 3 different machines. They are rotated at 0400. The
> logfiles are pretty big - maybe 6G per server, uncompressed.
>
> I've got to produce a combin
On Mon, Nov 9, 2009 at 4:36 AM, Stephen Nelson-Smith wrote:
I want to extract 24 hrs of data based timestamps like this:
[04/Nov/2009:04:02:10 +]
>>>
>>> OK It looks like you could use a regex to extract the first
>>> thing you find between square brackets. Then convert that to a
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hello,
: An apache logfile entry looks like this:
:
: 89.151.119.196 - - [04/Nov/2009:04:02:10 +] "GET
: /service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
: HTTP/1.1" 200 50 "-" "-"
:
: I want to extract 24 hrs of data based t
Sorry - forgot to include the list.
On Mon, Nov 9, 2009 at 9:33 AM, Stephen Nelson-Smith wrote:
> On Mon, Nov 9, 2009 at 9:10 AM, ALAN GAULD wrote:
>>
>>> An apache logfile entry looks like this:
>>>
>>>89.151.119.196 - - [04/Nov/2009:04:02:10 +] "GET
>>> /service.php?s=nav&arg[]=&arg[]=home
> An apache logfile entry looks like this:
>
>89.151.119.196 - - [04/Nov/2009:04:02:10 +] "GET
> /service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
> HTTP/1.1" 200 50 "-" "-"
>
>I want to extract 24 hrs of data based timestamps like this:
>
> [04/Nov/2009:04:02:10 +]
OK It lo
On Mon, Nov 9, 2009 at 8:47 AM, Alan Gauld wrote:
> I'm not familiar with Apache log files so I'll let somebody else answer,
> but I suspect you can either use string.split() or a re.findall(). You might
> even be able to use csv. Or if they are in XML you could use ElementTree.
> It all depends
"Stephen Nelson-Smith" wrote
* How does Python compare in performance to shell, awk etc in a big
pipeline? The shell script kills the CPU
Python should be significantly faster than the typical shell script
and it should consume less resources, although it will probably
still use a fair bit o
I've got a large amount of data in the form of 3 apache and 3 varnish
logfiles from 3 different machines. They are rotated at 0400. The
logfiles are pretty big - maybe 6G per server, uncompressed.
I've got to produce a combined logfile for -2359 for a given day,
with a bit of filtering (remo
16 matches
Mail list logo