I'm loading a large number of large Arrow IPC streams/files from disk with
mmap. I'd like to demand-load the contents instead of prefetching them – or
at least have better control over disk IO.

Calling `read_all` on a stream triggers a complete read of the file
(`MADV_WILLNEED` over the entire byte range of the file) whereas `read_all`
on a file seems to trigger a complete read through page faults. I'm not
fully confident in the latter behavior.

Is there a way I can disable prefetching in the stream case or configure
Arrow to demand-load Tables? I'd like to get a reference to a Table without
triggering disk reads except for the schema + magic bytes + metadata.

-s
*Builder @ LMNT*
Web <https://www.lmnt.com> | LinkedIn
<https://www.linkedin.com/in/sharvil-nanavati/>

Reply via email to