Dear Arrow developers,
I was wondering if it's possible to use the scanner API to read batches
starting from a certain row offset.
Currently I am doing something like this:
reader = dataset.scanner(filter=expr_filters).to_reader()
to get a record batch reader, but I am reading data in parallel with
multiple processes and already know the row counts and from what offset
I want each process to read. Problem with the above code is that every
processes will materialize batches into memory starting from the
beginning (therefore reading the same data multiple times).
Thanks