Hi
I am trying to use the Arrow Glib API to read/write from C. Specifically, while
Arrow is a columnar format, I'm really excited to be able to write a lot of
rows from a C like runtime and access it from python for analytics as an array
per column. And vice versa.
To get a quick example running, I created an Arrow table in python with 100
million entries as follows:
```py
import pyarrow as pa
foo = {
"colA": np.arange(0, 1000_000),
"colB": [np.arange(1, 5)] * 1000_000
}
table = pa.table(foo)
with pa.RecordBatchFileWriter("/tmp/batch.arrow", table.schema) as writer:
for _ in range(100):
writer.write_table(table)
```
However, using the Glib API to read the ListArray column data looks really
slow. It takes like 5 seconds per record batch with a million entries. While
the integer column over the entire table can be iterated over under 2 seconds.
The relevant snippet is this:
```C
guint num_batches = 100;
for (i = 0; i < num_batches; i++) {
GArrowRecordBatch *record_batch;
record_batch =
garrow_record_batch_file_reader_read_record_batch(reader, i, &error);
GArrowArray* column = garrow_record_batch_get_column_data(record_batch,
1);
guint length_list = garrow_array_get_length(column);
GArrowListArray* list_arr = (GArrowListArray*)column;
guint j;
GArrowArray* list_elem;
for (j = 0; j < length_list; j++) {
list_elem = garrow_list_array_get_value(list_arr, j);
}
}
```
I can't seem to find a quicker alternative in the public Glib API to read data
out of a list array. Is there a way to speed up this loop?
Thank you,
Ishan