Hi Matias, If you are going to do tensor operations, then you could use the Arrow tensor representation.
https://arrow.apache.org/docs/python/generated/pyarrow.Tensor.html However, I don't think the data stored in the tensor will be compressed. It will be orderly stored so you can share the tensors with other processes. I hope that helps Fernando On Fri, Mar 19, 2021 at 8:52 AM Matias Guijarro <[email protected]> wrote: > Hi ! > > I recently learned about Apache Arrow, and as a preliminary study I would > like to know if it can be a good choice for my use case, or if I have to > look > for another technology (or to craft something specific on my own !). > > I could not really find answers to my questions in the FAQ or reading > articles and blogs, but I may have missed some information so I apologize > in advance if my questions have already been answered. > > Arrow is all about storing columnar data. What can be the content of the > elements in a column ? > > In my case, I have scalar values (numbers), 1D arrays and 2D arrays. > The 2D arrays can be quite big (4000x4000 float 32 for example). > So, we could imagine long tables, hundred thousands of lines, containing > a mix of those data types. > > I wonder if Arrow stays efficient for such kind of data ? In particular, > rows of 2D data arrays in a column may be difficult to handle with the > same level of optimization ? (just guessing) > > Is there some compression in Arrow ? I am thinking about blosc kind of > compression (like in the dead "bcolz" project - by the way someone already > wondered about Arrow + Blosc: https://github.com/Blosc/bcolz/issues/300) > > Another use case I have, is to be able for multiple processes on the same > computer to access the Arrow in-memory store ; it seems to me Plasma > does this job but I wonder about the trade-offs ? > > Thanks in advance for your advices - any help would be highly appreciated ! > > Cheers, > Matias. > > > > > > >
