If I'm understanding the below correctly, it seems that the file format
supports finding an arbitrary compressed buffer without decompressing
anything else. Correct?
-John
/// --
/// A Buffer represents a single contiguous
Regarding tab=feather.read_table(fname, memory_map=True)
Uncompressed: low-cost setup and len(tab), data is read when sections of
the map are "paged in" by the OS
Compressed (desired):
* low-cost setup
* read the length of the "table" without decompressing anything (
len(tab) )
*
Hi,
AFAIK compressed IPC arrow files do not support random access (like
uncompressed counterparts) - you need to decompress the whole batch (or at
least the columns you need). A "RecordBatch" is the compression unit of the
file. Think of it like a parquet file whose every row group has a single
Why aren't all the compressed batches the chunk size I specified in
write_feather (700)? How can I know which batch my slice resides in if
this is not a constant? Using pyarrow 9.0.0
This file contains 1.5 billion rows. I need a way to know where to look
for, say, [780567127,922022522)
The following seems like good news... like I should be able to decompress
just one column of a RecordBatch in the middle of a compressed feather v2
file. Is there a Python API for this kind of access? C++?
/// Provided for forward compatibility in case we need to support different
///
``Internal structure supports random access and slicing from the middle.
This also means that you can read a large file chunk by chunk without
having to pull the whole thing into memory.''
https://ursalabs.org/blog/2020-feather-v2/
For a compressed v2 file, can I decompress just one column of a