The easiest way I can think of is writing the data to a single file then using mmap [1] and file IO [2] to read the data of interest. I haven't tested this out but I think this would be zero-copy.
Another alternative which isn't currently maintained is Plasma [3] [1] https://docs.python.org/3/library/mmap.html [2] https://arrow.apache.org/docs/python/memory.html#input-and-output [3] https://arrow.apache.org/docs/python/plasma.html On Fri, Dec 11, 2020 at 11:21 AM Fernando Herrera < [email protected]> wrote: > Hello, > > I'm implementing a text data analyzer that compares all the data against > each other. This process benefits a lot from using multiprocessing. > However, I'm having problems sharing the data between the processes and I > think arrow will solve this easily. > I was wondering if someone could point me in the right direction. How does > one go sharing the location of the mapped data among the different > processes spawned by the main process? Do I have to share the table pointer > between the processes? > > Any guidance would be much appreciated > Fernando >
