GitHub user jacek-pliszka added a comment to the discussion: How to find the best write options for read of file bytes ?
Test row-oriented formats as you want to read rows. Parquet is column-oriented format - better if you operate on columns, not rows. You have not written where the files will be stored, is it a local drive, what OS? cloud? You have not specified how many of them will you have and whether you will modify them or just store. And whether you care more about read speed or about file size. Maybe you do not need any special file format - maybe you can store them as files (on Linux with fast filesystem) or as .zip (if you do not modify them). In both cases they will be indexed by file name and the access will be quite easy (you do not need to unpack whole .zip in Python to read a file from there). GitHub link: https://github.com/apache/arrow/discussions/48940#discussioncomment-15609084 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
