I've filed https://issues.apache.org/jira/browse/BEAM-12435 to track this
improvement.
From: Matt Rudary
Sent: Monday, May 24, 2021 4:49 PM
To: dev@beam.apache.org
Subject: Re: Proposal: Generalize S3FileSystem
Thanks for the comments all. I forgot to subscribe to dev before I sent out
Thanks for the comments all. I forgot to subscribe to dev before I sent out the
email, so this response isn't threaded properly.
My proposed design is to do the following (for both aws and aws2 packages):
1. Add a public class, S3FileSystemConfiguration, that mostly maps to the
Please follow URL intention if at all possible. Specifically the bits
before the : should indicate how to parse the rest of the URL, not other
information. Is this convention of sticking the host before the : already
an established thing for s3-compatible endpoints?
If the various S3-compatible
Is it feasible to keep the endpoint information in the path? It seems
pretty desirable to keep URIs "universal" so that it's possible to
understand what is being pointed to without explicit service configuration,
so maybe you can have a scheme like "s3+endpoint=api.example.com
://my/bucket/path"?
$.02
Most important is community to maintain it. It cannot be a separate project
or subproject (lots of ASF projects have this, so they share governance)
without that.
To add additional friction of separate release and dependency in build
before you have community, it should be extremely stable
On Thu, May 20, 2021 at 10:12 AM Chad Dombrova wrote:
> Hi Brian,
> I think the main goal would be to make a python package that could be pip
> installed independently of apache_beam. That goal could be accomplished
> with option 3, thus preserving all of the benefits of a monorepo. If it
>
Hi Brian,
I think the main goal would be to make a python package that could be pip
installed independently of apache_beam. That goal could be accomplished
with option 3, thus preserving all of the benefits of a monorepo. If it
gains enough popularity and contributors outside of the Beam
That's an interesting idea. What do you mean by its own project? A couple
of possibilities:
- Spinning off a new ASF project
- A separate Beam-governed repository (e.g. apache/beam-filesystems)
- More clearly separate it in the current build system and release
artifacts that allow it to be used
This is a random idea, but the whole file IO system inside Beam would
actually be awesome to extract into its own project. IIRC, it’s not
particularly tied to Beam.
I’m not saying this should be done now, but it’s be nice to keep it mind
for a future goal.
-chad
On Wed, May 19, 2021 at 10:23
That would be great to add, Matt. Of course it's important to make this
backwards compatible, but other than that, the addition would be very
welcome.
On Wed, May 19, 2021 at 9:41 AM Matt Rudary
wrote:
> Hi,
>
>
>
> This is a quick sketch of a proposal – I wanted to get a sense of whether
>
Hi,
This is a quick sketch of a proposal - I wanted to get a sense of whether
there's general support for this idea before fleshing it out further, getting
internal approvals, etc.
I'm working with multiple storage systems that speak the S3 api. I would like
to support FileIO operations for
11 matches
Mail list logo