If you have used *StreamWriter to create a buffer that you're sending to another process, if you want to get it back into a record batch or table, you need to read it with pyarrow.ipc.open_stream(...).read_all() and then you can concatenate the resulting tables with pyarrow.concat_tables (or use Table.from_batches if you have a sequence of record batches)
Hope this helps On Thu, Jan 21, 2021 at 6:19 AM Jonathan MERCIER <[email protected]> wrote: > > Same question but more simple to understand. > > Using pyarrow and working with pieces of data by process (multi-process > as workaround GIL limitation). What is the correct way to handle this task ? > > 1. each // process have to create create a list of records store them > into a record batch and return this batch > > 2. each // process have to create an output and writer buffer , create a > list of records store them into a record batch and write this record > batch into the stream writer. The process return the corresponding buffer ? > > with the answer (1) I see how to merge all of those batch but with > solution (2) how to merge all buffer to one once each process has > returned their buffer ? > > > > Thanks > > > -- > Jonathan MERCIER > > Researcher computational biology > > PhD, Jonathan MERCIER > > Centre National de Recherche en Génomique Humaine (CNRGH) > > Bioinformatics (LBI) > > 2, rue Gaston Crémieux > > 91057 Evry Cedex > > Tel :(33) 1 60 87 34 88 > > Email :[email protected] <mailto:[email protected]> >
