Re: Processing large CSV files - how to maximise throughput?

2013-10-26 Thread Walter Hurry
On Thu, 24 Oct 2013 18:38:21 -0700, Victor Hooi wrote: Hi, We have a directory of large CSV files that we'd like to process in Python. We process each input CSV, then generate a corresponding output CSV file. input CSV - munging text, lookups etc. - output CSV My question is,

Re: Processing large CSV files - how to maximise throughput?

2013-10-25 Thread Chris Angelico
On Fri, Oct 25, 2013 at 2:57 PM, Dave Angel da...@davea.name wrote: But I would concur -- probably they'll both give about the same speedup. I just detest the pain that multithreading can bring, and tend to avoid it if at all possible. I don't have a history of major pain from threading. Is

Re: Processing large CSV files - how to maximise throughput?

2013-10-25 Thread Stefan Behnel
Chris Angelico, 25.10.2013 08:13: On Fri, Oct 25, 2013 at 2:57 PM, Dave Angel wrote: But I would concur -- probably they'll both give about the same speedup. I just detest the pain that multithreading can bring, and tend to avoid it if at all possible. I don't have a history of major pain

Re: Processing large CSV files - how to maximise throughput?

2013-10-25 Thread Chris Angelico
On Fri, Oct 25, 2013 at 5:39 PM, Stefan Behnel stefan...@behnel.de wrote: Basically, with multiple processes, you start with independent systems and add connections specifically where needed, whereas with threads, you start with completely shared state and then prune away interdependencies and

Re: Processing large CSV files - how to maximise throughput?

2013-10-25 Thread Dave Angel
On 25/10/2013 02:13, Chris Angelico wrote: On Fri, Oct 25, 2013 at 2:57 PM, Dave Angel da...@davea.name wrote: But I would concur -- probably they'll both give about the same speedup. I just detest the pain that multithreading can bring, and tend to avoid it if at all possible. I don't have

Re: Processing large CSV files - how to maximise throughput?

2013-10-25 Thread Chris Angelico
On Fri, Oct 25, 2013 at 10:24 PM, Dave Angel da...@davea.name wrote: On 25/10/2013 02:13, Chris Angelico wrote: On Fri, Oct 25, 2013 at 2:57 PM, Dave Angel da...@davea.name wrote: But I would concur -- probably they'll both give about the same speedup. I just detest the pain that

Re: Processing large CSV files - how to maximise throughput?

2013-10-25 Thread Roy Smith
In article mailman.1560.1382744694.18130.python-l...@python.org, Dennis Lee Bieber wlfr...@ix.netcom.com wrote: Memory is cheap -- I/O is slow. G Just how massive are these CSV files? Actually, these days, the economics of hardware are more like, CPU is cheap, memory is expensive. I

Processing large CSV files - how to maximise throughput?

2013-10-24 Thread Victor Hooi
Hi, We have a directory of large CSV files that we'd like to process in Python. We process each input CSV, then generate a corresponding output CSV file. input CSV - munging text, lookups etc. - output CSV My question is, what's the most Pythonic way of handling this? (Which I'm assuming

Re: Processing large CSV files - how to maximise throughput?

2013-10-24 Thread Dave Angel
On 24/10/2013 21:38, Victor Hooi wrote: Hi, We have a directory of large CSV files that we'd like to process in Python. We process each input CSV, then generate a corresponding output CSV file. input CSV - munging text, lookups etc. - output CSV My question is, what's the most Pythonic

Re: Processing large CSV files - how to maximise throughput?

2013-10-24 Thread Steven D'Aprano
On Thu, 24 Oct 2013 18:38:21 -0700, Victor Hooi wrote: Hi, We have a directory of large CSV files that we'd like to process in Python. We process each input CSV, then generate a corresponding output CSV file. input CSV - munging text, lookups etc. - output CSV My question is,

Re: Processing large CSV files - how to maximise throughput?

2013-10-24 Thread Steven D'Aprano
On Fri, 25 Oct 2013 02:10:07 +, Dave Angel wrote: If I have multiple large CSV files to deal with, and I'm on a multi-core machine, is there anything else I can do to boost throughput? Start multiple processes. For what you're doing, there's probably no point in multithreading. Since

Re: Processing large CSV files - how to maximise throughput?

2013-10-24 Thread Mark Lawrence
On 25/10/2013 02:38, Victor Hooi wrote: So for the reading, it'll iterates over the lines one by one, and won't read it into memory which is good. Wow this is fantastic, which OS are you using? Or do you actually mean that the whole file doesn't get read into memory, only one line at a

Re: Processing large CSV files - how to maximise throughput?

2013-10-24 Thread Dave Angel
On 24/10/2013 23:35, Steven D'Aprano wrote: On Fri, 25 Oct 2013 02:10:07 +, Dave Angel wrote: If I have multiple large CSV files to deal with, and I'm on a multi-core machine, is there anything else I can do to boost throughput? Start multiple processes. For what you're doing,