On Tue, Dec 17, 2019 at 5:15 AM Maarten Breddels
wrote:
>
> Hi,
>
> I had to catch up a bit with the arrow documentation before I could respond
> properly. My fear was that Arrow demanded that the in-memory representation
> was always 'packed', or 'flat'. After going through the docs, it seems tha
Hi,
I had to catch up a bit with the arrow documentation before I could respond
properly. My fear was that Arrow demanded that the in-memory representation
was always 'packed', or 'flat'. After going through the docs, it seems that
only when doing IPC or stream writing, it is written in this form.
hi,
There have been a number of discussions over the years about on-disk
pre-allocation strategies. No volunteers have implemented anything,
though. Developing an HDF5 integration library with pre-allocation and
buffer management utilities seems like a reasonable growth area for
the project. The f
Hello Maarten,
In theory, you could provide a custom mmap-allocator and use the
builder facility. Since the array is still in "build-phase" and not
sealed, it should be fine if mremap changes the pointer address. This
might fail in practice since the allocator is also used for auxiliary
data, e.g.
In vaex I always write the data to hdf5 as 1 large chunk (per column).
The reason is that it allows the mmapped columns to be exposed as a
single numpy array (talking numerical data only for now), which many
people are quite comfortable with.
The strategy for vaex to write unchunked data, is to fi