Micah, Great idea, thank you! I really appreciate the pointer. On Wed, Jul 20, 2022 at 12:04 AM Micah Kornfield <[email protected]> wrote:
> You could maybe use datasets on top of fsspec's zip file system [1]? > > [1] > https://filesystem-spec.readthedocs.io/en/latest/_modules/fsspec/implementations/zip.html > > On Tuesday, July 19, 2022, Kirby, Adam <[email protected]> wrote: > >> Hi All, >> >> I'm currently using pyarrow.csv.read_csv to parse a CSV stream that >> originates from a ZIP of multiple CSV files. For now, I'm using a separate >> implementation to do the streaming ZIP decompression, then >> using pyarrow.csv.read_csv at each CSV file boundary. >> >> I would love if there were a way to leverage pyarrow to handle the >> decompression. From what I've seen in examples, a ZIP file containing a >> single CSV is supported -- that is, it's possible to operate on a >> compressed CSV stream -- but I wonder if it's possible to handle a >> compressed stream that contains multiple files? >> >> Thank you in advance! >> >
