The Buzz project is one example I know of that reads parquet files from S3 using the Rust implementation
https://github.com/cloudfuse-io/buzz-rust/blob/13175a7c5cdd298415889da710a254a218be0a01/code/src/execution_plan/parquet.rs The SerializedFileReader[1] from the Rust parquet crate, despite its somewhat misleading name, doesn't have to read from files, instead it reads from something that implements the ChunkReader [2] trait. I am not sure how well this matches what you are looking for. Hope that helps, Andrew [1] https://docs.rs/parquet/3.0.0/parquet/file/serialized_reader/struct.SerializedFileReader.html [2] https://docs.rs/parquet/3.0.0/parquet/file/reader/trait.ChunkReader.html On Sat, Feb 13, 2021 at 10:17 AM Steve Kim <[email protected]> wrote: > > Currently, parquet.rs only supports local disk files. Potentially, this > can be done using the rusoto crate that provides a s3 client. What would be > a good way to do this? > > 1. create a remote parquet reader (potentially duplicate lots of code) > > 2. create an interface to abstract away reading from local/remote files > (not sure about performance if the reader blocks on every operation) > > This is a great question. > > I think that approach (2) is superior, although it requires more work > than approach (1) to design an interface that works well across > multiple file stores that have different performance characteristics. > To accommodate storage-specific performance optimizations, I expect > that the common interface will have to be more elaborate than the > current reader API. > > Is it possible for the Rust reader to use the c++ implementation > (https://github.com/apache/arrow/tree/master/cpp/src/arrow/filesystem)? > If this reuse of implementation is feasible, then we could focus > efforts on improving the c++ implementation and get the benefits in > Python, Rust, etc. > > In the Java ecosystem, the (non-Arrow, row-wise) Parquet reader uses > the Hadoop FileSystem abstraction. This abstraction is complex, leaky, > and not well specialized for read patterns that are typical for > Parquet files. We can learn from these mistakes to create a superior > reader interface in the Arrow/Parquet project. > > Steve >
