Hi,

I'm trying to efficiently convert incoming numpy.recarray's to 
pyarrow.StructArray and I'm unsure how to do so with the least amount of 
copying.

My use case involves real time data processing of numpy.recarrays in Rust. I'm 
happily using the IPC protocol to transfer data to Rust's arrow implementation 
which will do the heavy lifting. I'll need to iterate on the 
recarray-turned-StructArray line-by-line, each time yielding all fields of a 
specific row, so the StructArray format is quite fitting. However, doing the 
actual conversion in an efficient manner seems harder than expected. The fields 
(=individual arrays) of a numpy.recarray aren't stored in a contiguous manner, 
so any numpy.recarray -> pyarrow.Array conversion first has to copy the data to 
standard pyarrow.Array buffers, and then re-construct the StructArray structure 
by interleaving the arrays. I was unable to find in the docs or in previous 
discussions here a better approach for this type of pre-processing step.

Since I'm using IPC I'll eventually need to have the pyarrow.StructArray 
wrapped in a pyarrow.RecordBatch if that makes any difference.

Thanks in advance.

Reply via email to