[Python] Implementing own Filesystem Subclass in PyArrow v3.0.0

Jae Lee Wed, 09 Feb 2022 20:06:42 -0800

Hi Team,

I would like to implement a custom subclass of
pyarrow.filesystem.FileSystem (or perhaps pyarrow.fs.FileSystem) and was
hoping to leverage the full potential of what pyarrow provides with parquet
files - partitioning, filter, etc. The underneath storage is cloud-based
and not S3 compatible. Our API only provides support for
- CRUD bucket
- CRUD objects
Currently, there is no support for streaming or working with any type of
file handle. I've already looked into how s3fs.cc was implemented but was
not sure I could apply it in my situation.


Questions:
1. What Filesystem class do I need to implement to take full advantage of
what arrow provides in terms of dealing with parquet files?
(pyarrow.filesystem.FileSystem vs pyarrow.fs.FileSystem)
2. Is there any example of implementation of cloud-based non-s3 compatible
filesystem?
3. Given our limited API sets, what would you recommend?

Initially, I was thinking to download the entire parquet file/directory to
a local file system and provide a handle but was curious if there would be
an any better way to handle this.

Thank you in advance!
Jae

[Python] Implementing own Filesystem Subclass in PyArrow v3.0.0

Reply via email to