Hi Team, I would like to implement a custom subclass of pyarrow.filesystem.FileSystem (or perhaps pyarrow.fs.FileSystem) and was hoping to leverage the full potential of what pyarrow provides with parquet files - partitioning, filter, etc. The underneath storage is cloud-based and not S3 compatible. Our API only provides support for - CRUD bucket - CRUD objects Currently, there is no support for streaming or working with any type of file handle. I've already looked into how s3fs.cc was implemented but was not sure I could apply it in my situation.
Questions: 1. What Filesystem class do I need to implement to take full advantage of what arrow provides in terms of dealing with parquet files? (pyarrow.filesystem.FileSystem vs pyarrow.fs.FileSystem) 2. Is there any example of implementation of cloud-based non-s3 compatible filesystem? 3. Given our limited API sets, what would you recommend? Initially, I was thinking to download the entire parquet file/directory to a local file system and provide a handle but was curious if there would be an any better way to handle this. Thank you in advance! Jae
