Is it possible to use a starting offset with Scanner?

Juan Galvez Mon, 21 Mar 2022 12:19:03 -0700

Dear Arrow developers,

I was wondering if it's possible to use the scanner API to read batchesstarting from a certain row offset.


Currently I am doing something like this:

    reader = dataset.scanner(filter=expr_filters).to_reader()

to get a record batch reader, but I am reading data in parallel withmultiple processes and already know the row counts and from what offsetI want each process to read. Problem with the above code is that everyprocesses will materialize batches into memory starting from thebeginning (therefore reading the same data multiple times).


Thanks

Is it possible to use a starting offset with Scanner?

Reply via email to