"Stephen Nelson-Smith" <[email protected]> wrote
I don't really want to admit defeat and have a cron job sort the logs
before entry.  Anyone got any other ideas?

Why would that be admitting defeat?
Its normal when processing large data volumes to break the process into discrete steps that can be done in bulk and in parallel - thats how mainframes have worked for years.

Its a perfectly valid approach to preprocess your input data
so that your main processing can be more efficient. The trick is to spread the load where the task is repeatable (eg sorting a file - and even the filteriing of your php's) and maximise the efficiency where it is not (ie merging).

So it would be valid to have a set of batch jobs removing the phps, followed by another job to sort each file, then finally merge the reduced files. The initial filtering and sorting can both be done on a per file basis in parallel.

HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to