Motivation: We have memory-mappable Arrow IPC files with N batches where column(s) are sorted to support binary search. Because log2(n) < log2(n/2)+log2(n/2) and binary search is required on each batch, we prefer the batches to be as large as possible to reduce total search time... perhaps larger than available RAM.... on the read side, only pages needed for the search bisections and subsequent slice traversal are mapped in, of course.
The question then becomes one of creating large IPC-format files where individual batches do not exist first in RAM because of their size. Conceptually, this would seem to entail: * allocating a fixed mmap'd area for writing to * using builders to create buffers at the locations they would end up at for an IPC format, and freezing these as arrays (if I understand the terminology correctly) * plopping in various other things such as metadata, schema, etc One difficulty is that we want to size this area without having first analyzed the data to be written to it, since such an analysis consumes compute resources. Therefore the area set aside for (e.g.) a variable length string column would be a guess based on statistics and we would want to just write the column buffers until the first one is full, which may leave others (or itself) partially unpopulated. This could result in some "wasted space" in the file which is a tradeoff we can live with for the above reasons, which brings me back to https://issues.apache.org/jira/browse/ARROW-5916 where this was discussed before (and another discussion is linked there). The idea was to allow record batch lengths to be smaller than the associated buffer lengths, which seemed like an easy change at the time... although I'll grant that we only use trivial arrow types and in more complex cases there may be side-effects I can't envision. One of the ideas was to go ahead and fill in the buffers to create a valid recordbatch but then store the sliced-down size in (e.g.) the user-defined metadata, but this forces anyone using the IPC file to use a non-standard mechanism to reject the "data" that fills the unpopulated buffer sections. Even with the ability for a batch to be smaller than its buffers (to help readers reject the residual of the buffers without referring to custom metadata), I think I'm left with needing to create low-level code outside of the Arrow library to create such a file since I cannot first create the batch in RAM and then copy it out, due to the size and also due to wanting to avoid the copy operation. Any thoughts on creating large IPC format record batch(es) in-place in a single pre-allocated buffer, that could be used with mmap? Here is someone with a similar concern: https://www.mail-archive.com/user@arrow.apache.org/msg01187.html It seems like the https://arrow.apache.org/docs/cpp/examples/row_columnar_conversion.html example could be tweaked to use "pools" that defines exactly where to put each buffer, but then the final `arrow::Table::Make` (or equivalent for batches/IPC) must also receive instruction about where exactly to write user metadata, schema, footer, etc. Thanks for any ideas, John