This comes out of a long discussion on the Cython list. Following Mark's success with the shared memory parallelism, the question is: Where to take Cython's capabilities for parallelism further?
One thing that's been up now and then is that we could basically use something like: - multiprocessing (to get rid of any GIL issues) - allocate all NumPy arrays in process-shared memory; passing NumPy arrays between processes happens by "picling views". This can be done with current NumPy by using a seperate constructor, e.g., a = sharedmem_zeros((3, 3)) However, construction of the array feels like the wrong place to make this decision. It is really when it is sent to another process the decision should be made. If all NumPy arrays are allocated in shared memory per default, one could do shared_queue.put(a.shared()) and shared() would wrap a in something that pickled a shared memory pointer rather than the data (and unpickled directly to the NumPy array). I just find this *a lot* more convenient than the tedious business of making sure the memory is allocated in the right way everywhere. Any downsides to doing this? (Additional overhead for small arrays perhaps?) - On the Cython end, parallelism could then both be supported by low-level message passing using ZeroMQ (possibly with syntax candy for sending typed messages), or with another multiprocessing backend to the current prange which requires that any memoryviews worked on are allocated in shared memory. I'm just looking for feedback here. I don't have cycles in terms of implementation; the point is that what NumPy users and devs are thinking about this could direct the further discussion of parallelism within Cython. Dag Sverre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion