Hi Chris, > Upgrading to 6.0.X I noticed that record batches can have body compression > which I think is great.
Small nit: this was released in Arrow 4. I had trouble finding examples in python or R (or java) of writing an IPC > file with various types of compression used for the record batch. Java code is at [1] with implementations for compression codec living in [2]. Is the compression applied per-column or upon the record batch after the > buffers have been serialized to the batch? If it is applied per column > which buffers - given that text for example can consist of 3 buffers > (validity, offset, data) is compression applied to all three or just data > or data and offset? It is applied per buffer, all buffers are compressed. Cheers, Micah [1] https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/java/vector/src/main/java/org/apache/arrow/vector/VectorUnloader.java#L100 [2] https://github.com/apache/arrow/tree/971a9d352e456882aa5b70ac722607840cdb9df7/java/compression/src On Thu, Jan 13, 2022 at 2:55 PM Chris Nuernberger <[email protected]> wrote: > Upgrading to 6.0.X I noticed that record batches can have body compression > which I think is great. > > I had trouble finding examples in python or R (or java) of writing an IPC > file with various types of compression used for the record batch. > > Is the compression applied per-column or upon the record batch after the > buffers have been serialized to the batch? If it is applied per column > which buffers - given that text for example can consist of 3 buffers > (validity, offset, data) is compression applied to all three or just data > or data and offset? >
