Re: Pickle based workflow - looking for advice

2015-04-14 Thread Fabien
On 14.04.2015 06:05, Chris Angelico wrote: Not sure what you mean, here. Any given file will be written by exactly one process? No possible problem. Multiprocessing within one application doesn't change that. yes that's what I meant. Thanks! --

Re: Pickle based workflow - looking for advice

2015-04-14 Thread Steven D'Aprano
On Tue, 14 Apr 2015 11:45 pm, Chris Angelico wrote: On Tue, Apr 14, 2015 at 11:08 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Tue, 14 Apr 2015 05:58 pm, Fabien wrote: On 14.04.2015 06:05, Chris Angelico wrote: Not sure what you mean, here. Any given file will be

Re: Pickle based workflow - looking for advice

2015-04-14 Thread Chris Angelico
On Wed, Apr 15, 2015 at 12:14 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Tue, 14 Apr 2015 11:45 pm, Chris Angelico wrote: On Tue, Apr 14, 2015 at 11:08 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Tue, 14 Apr 2015 05:58 pm, Fabien wrote: On

Re: Pickle based workflow - looking for advice

2015-04-14 Thread Chris Angelico
On Tue, Apr 14, 2015 at 11:08 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Tue, 14 Apr 2015 05:58 pm, Fabien wrote: On 14.04.2015 06:05, Chris Angelico wrote: Not sure what you mean, here. Any given file will be written by exactly one process? No possible problem.

Re: Pickle based workflow - looking for advice

2015-04-14 Thread Steven D'Aprano
On Tue, 14 Apr 2015 05:58 pm, Fabien wrote: On 14.04.2015 06:05, Chris Angelico wrote: Not sure what you mean, here. Any given file will be written by exactly one process? No possible problem. Multiprocessing within one application doesn't change that. yes that's what I meant. Thanks!

Re: Pickle based workflow - looking for advice

2015-04-13 Thread Fabien
On 13.04.2015 17:45, Devin Jeanpierre wrote: On Mon, Apr 13, 2015 at 10:58 AM, Fabienfabien.mauss...@gmail.com wrote: Now, to my questions: 1. Does that seem reasonable? A big issue is the use of pickle, which is: * Often suboptimal performance wise (e.g. you can't load only subsets of the

Re: Pickle based workflow - looking for advice

2015-04-13 Thread Chris Angelico
On Tue, Apr 14, 2015 at 3:35 AM, Fabien fabien.mauss...@gmail.com wrote: With multiprocessing, do I have to care about processes writing simultaneously in *different* files? I guess the OS takes good care of this stuff but I'm not an expert. Not sure what you mean, here. Any given file will be

Re: Pickle based workflow - looking for advice

2015-04-13 Thread Fabien
On 13.04.2015 19:08, Peter Otten wrote: How about a file-based workflow? Write distinct scripts, e. g. a2b.py that reads from *.a and writes to *.b and so on. Then use a plain old makefile to define the dependencies. Whether .a uses pickle, .b uses json, and .z uses csv is but an

Re: Pickle based workflow - looking for advice

2015-04-13 Thread Peter Otten
Fabien wrote: I am writing a quite extensive piece of scientific software. Its workflow is quite easy to explain. The tool realizes series of operations on watersheds (such as mapping data on it, geostatistics and more). There are thousands of independent watersheds of different size, and

Re: Pickle based workflow - looking for advice

2015-04-13 Thread Fabien
On 13.04.2015 18:25, Dave Angel wrote: On 04/13/2015 10:58 AM, Fabien wrote: Folks, A comment. Pickle is a method of creating persistent data, most commonly used to preserve data between runs. A database is another method. Although either one can also be used with multiprocessing, you

Pickle based workflow - looking for advice

2015-04-13 Thread Fabien
Folks, I am writing a quite extensive piece of scientific software. Its workflow is quite easy to explain. The tool realizes series of operations on watersheds (such as mapping data on it, geostatistics and more). There are thousands of independent watersheds of different size, and the size

Re: Pickle based workflow - looking for advice

2015-04-13 Thread Dave Angel
On 04/13/2015 10:58 AM, Fabien wrote: Folks, A comment. Pickle is a method of creating persistent data, most commonly used to preserve data between runs. A database is another method. Although either one can also be used with multiprocessing, you seem to be worrying more about the

Re: Pickle based workflow - looking for advice

2015-04-13 Thread Devin Jeanpierre
On Mon, Apr 13, 2015 at 10:58 AM, Fabien fabien.mauss...@gmail.com wrote: Now, to my questions: 1. Does that seem reasonable? A big issue is the use of pickle, which is: * Often suboptimal performance wise (e.g. you can't load only subsets of the data) * Makes forwards/backwards compatibility

Re: Pickle based workflow - looking for advice

2015-04-13 Thread Robin Becker
for what it's worth I believe that marshal is a faster method for storing simple python objects. So if your information can be stored using simple python things eg strings, floats, integers, lists and dicts then storage using marshal is faster than pickle/cpickle. If you want to persist the