Le 12 mars 2018 23:05, "Chamikara Jayalath" a écrit :
On Mon, Mar 12, 2018 at 2:36 PM Romain Manni-Bucau
wrote:
>
>
> Le 12 mars 2018 22:22, "Chamikara Jayalath" a
> écrit :
>
>
>
> On Mon, Mar 12, 2018 at 12:42 PM Romain
On Mon, Mar 12, 2018 at 2:36 PM Romain Manni-Bucau
wrote:
>
>
> Le 12 mars 2018 22:22, "Chamikara Jayalath" a
> écrit :
>
>
>
> On Mon, Mar 12, 2018 at 12:42 PM Romain Manni-Bucau
> wrote:
>
>>
>>
>> Le 12 mars 2018 18:56,
Le 12 mars 2018 22:22, "Chamikara Jayalath" a écrit :
On Mon, Mar 12, 2018 at 12:42 PM Romain Manni-Bucau
wrote:
>
>
> Le 12 mars 2018 18:56, "Chamikara Jayalath" a
> écrit :
>
> Agree. We need file-system abstractions in
On Mon, Mar 12, 2018 at 12:42 PM Romain Manni-Bucau
wrote:
>
>
> Le 12 mars 2018 18:56, "Chamikara Jayalath" a
> écrit :
>
> Agree. We need file-system abstractions in all languages since (1) users
> may need to directly access file-systems from
I think a way to have transform-specific options could be useful,
regardless of this use case.
On Mon, Mar 12, 2018 at 12:42 PM Romain Manni-Bucau
wrote:
>
>
> Le 12 mars 2018 18:56, "Chamikara Jayalath" a
> écrit :
>
> Agree. We need file-system
Le 12 mars 2018 18:56, "Chamikara Jayalath" a écrit :
Agree. We need file-system abstractions in all languages since (1) users
may need to directly access file-systems from DoFns (2) common file-based
sources/sinks will probably will be available in multiple languages even
Agree and since all languages will support options and strings (didnt check
this last one but i hope so ;)) then prefix is by design portable :).
Passing directly pipeline options works too but still requires a portable
way to read options and requires a way to loosely typed it too without
Agree. We need file-system abstractions in all languages since (1) users
may need to directly access file-systems from DoFns (2) common file-based
sources/sinks will probably will be available in multiple languages even
with portability API and cross language IO (these are usually the first
There is still a lot of work before we get to supporting cross language
transforms and hence get access to filesystems written in different
languages but how the options are passed through from one to the other will
need to be well understood and it would be best if the way a user defines
these
Le 9 mars 2018 21:35, "Lukasz Cwik" a écrit :
The blocker is to get someone to follow through on the original design or
to get a new design (with feedback) and have it implemented.
If the pipelineoptionsfactory related pr are merged i can do a pr/proposal
bases on this thread
The blocker is to get someone to follow through on the original design or
to get a new design (with feedback) and have it implemented.
Note that this impacts more than just Java as it also exists in Python and
Go as well.
On Fri, Mar 9, 2018 at 12:18 PM, Romain Manni-Bucau
Hmm, it doesnt solve the issue that beam doesnt enable to configure
transform from its "config" (let say the cli).
So if i have a generic pipeline taking a file as input and another as
output then i must register 2 filesystems in all cases? If the pipeline is
dynamic i must make it dynamic too?
I think when I wrote the S3 code, I couldn't see how to set storage class
per-bucket, so put it in a flag. It's easy to imagine a use case where
storage class differs per filespec, not only per bucket.
Jacob
On Fri, Mar 9, 2018 at 9:51 AM, Jacob Marble wrote:
> Yes, I
Yes, I agree with all of this.
Jacob
On Thu, Mar 8, 2018 at 9:52 PM, Robert Bradshaw wrote:
> On Thu, Mar 8, 2018 at 9:38 PM Eugene Kirpichov
> wrote:
>
>> I think it may have been an API design mistake to put the S3 region into
>> PipelineOptions.
On Fri, Mar 9, 2018 at 9:24 AM Lukasz Cwik wrote:
> Note that TextIO/... internally use FileSystems (Java and Python).
>
> Based upon the current design where FileSystems is a global concept
> (decoupled from PTransforms), having PipelineOptions configure it is a good
> and
Note that TextIO/... internally use FileSystems (Java and Python).
Based upon the current design where FileSystems is a global concept
(decoupled from PTransforms), having PipelineOptions configure it is a good
and valid strategy.
Earlier work by Pei He and Daniel Halperin was towards having
File-based transforms are a little bit different because there is a
part of the configuration in the file transform (TextIO.read().foo(),
TextIO.write().bar()) and other part done in specific filesystem
options.
In the example TextIO.from(“...”) does not have a way to do something
like
AWS may not be the only provider, and if you use a different endpoint the API requires a region.
CreateOptions are probably a better place if the pipeline needs to access multiple endpoints or regions, but I suspect the user application is likely to still end up with pipeline options of its own
On Thu, Mar 8, 2018 at 9:38 PM Eugene Kirpichov
wrote:
> I think it may have been an API design mistake to put the S3 region into
> PipelineOptions.
>
+1, IMHO it's generally a mistake to put any transform configuration into
PipelineOptions for exactly this reason.
>
I think it may have been an API design mistake to put the S3 region into
PipelineOptions. PipelineOptions are global per pipeline, whereas it's
totally reasonable to access S3 files in different regions even from the
code of a single DoFn running on a single element. The same applies to
The "hint" would probably to use hints :) - indees this joke refers to the
hint thread.
Long story short with hints you should be able to say "use that specialize
config here".
Now, personally, I'd like to see a way to specialize config per transform.
With an hint an easy way is to use a prefix:
21 matches
Mail list logo