RE: Proposal: Generalize S3FileSystem

2021-06-01 Thread Matt Rudary
I've filed https://issues.apache.org/jira/browse/BEAM-12435 to track this improvement. From: Matt Rudary Sent: Monday, May 24, 2021 4:49 PM To: dev@beam.apache.org Subject: Re: Proposal: Generalize S3FileSystem Thanks for the comments all. I forgot to subscribe to dev before I sent out

Re: Proposal: Generalize S3FileSystem

2021-05-24 Thread Matt Rudary
Thanks for the comments all. I forgot to subscribe to dev before I sent out the email, so this response isn't threaded properly. My proposed design is to do the following (for both aws and aws2 packages): 1. Add a public class, S3FileSystemConfiguration, that mostly maps to the

Re: Proposal: Generalize S3FileSystem

2021-05-21 Thread Kenneth Knowles
Please follow URL intention if at all possible. Specifically the bits before the : should indicate how to parse the rest of the URL, not other information. Is this convention of sticking the host before the : already an established thing for s3-compatible endpoints? If the various S3-compatible

Re: Proposal: Generalize S3FileSystem

2021-05-20 Thread Charles Chen
Is it feasible to keep the endpoint information in the path? It seems pretty desirable to keep URIs "universal" so that it's possible to understand what is being pointed to without explicit service configuration, so maybe you can have a scheme like "s3+endpoint=api.example.com ://my/bucket/path"?

Re: Proposal: Generalize S3FileSystem

2021-05-20 Thread Kenneth Knowles
$.02 Most important is community to maintain it. It cannot be a separate project or subproject (lots of ASF projects have this, so they share governance) without that. To add additional friction of separate release and dependency in build before you have community, it should be extremely stable

Re: Proposal: Generalize S3FileSystem

2021-05-20 Thread Stephan Hoyer
On Thu, May 20, 2021 at 10:12 AM Chad Dombrova wrote: > Hi Brian, > I think the main goal would be to make a python package that could be pip > installed independently of apache_beam. That goal could be accomplished > with option 3, thus preserving all of the benefits of a monorepo. If it >

Re: Proposal: Generalize S3FileSystem

2021-05-20 Thread Chad Dombrova
Hi Brian, I think the main goal would be to make a python package that could be pip installed independently of apache_beam. That goal could be accomplished with option 3, thus preserving all of the benefits of a monorepo. If it gains enough popularity and contributors outside of the Beam

Re: Proposal: Generalize S3FileSystem

2021-05-20 Thread Brian Hulette
That's an interesting idea. What do you mean by its own project? A couple of possibilities: - Spinning off a new ASF project - A separate Beam-governed repository (e.g. apache/beam-filesystems) - More clearly separate it in the current build system and release artifacts that allow it to be used

Re: Proposal: Generalize S3FileSystem

2021-05-19 Thread Chad Dombrova
This is a random idea, but the whole file IO system inside Beam would actually be awesome to extract into its own project. IIRC, it’s not particularly tied to Beam. I’m not saying this should be done now, but it’s be nice to keep it mind for a future goal. -chad On Wed, May 19, 2021 at 10:23

Re: Proposal: Generalize S3FileSystem

2021-05-19 Thread Pablo Estrada
That would be great to add, Matt. Of course it's important to make this backwards compatible, but other than that, the addition would be very welcome. On Wed, May 19, 2021 at 9:41 AM Matt Rudary wrote: > Hi, > > > > This is a quick sketch of a proposal – I wanted to get a sense of whether >

Proposal: Generalize S3FileSystem

2021-05-19 Thread Matt Rudary
Hi, This is a quick sketch of a proposal - I wanted to get a sense of whether there's general support for this idea before fleshing it out further, getting internal approvals, etc. I'm working with multiple storage systems that speak the S3 api. I would like to support FileIO operations for