Re: Configuring file-based transforms with different options

2018-03-12 Thread Romain Manni-Bucau
Le 12 mars 2018 23:05, "Chamikara Jayalath" a écrit : On Mon, Mar 12, 2018 at 2:36 PM Romain Manni-Bucau wrote: > > > Le 12 mars 2018 22:22, "Chamikara Jayalath" a > écrit : > > > > On Mon, Mar 12, 2018 at 12:42 PM Romain

Re: Configuring file-based transforms with different options

2018-03-12 Thread Chamikara Jayalath
On Mon, Mar 12, 2018 at 2:36 PM Romain Manni-Bucau wrote: > > > Le 12 mars 2018 22:22, "Chamikara Jayalath" a > écrit : > > > > On Mon, Mar 12, 2018 at 12:42 PM Romain Manni-Bucau > wrote: > >> >> >> Le 12 mars 2018 18:56,

Re: Configuring file-based transforms with different options

2018-03-12 Thread Romain Manni-Bucau
Le 12 mars 2018 22:22, "Chamikara Jayalath" a écrit : On Mon, Mar 12, 2018 at 12:42 PM Romain Manni-Bucau wrote: > > > Le 12 mars 2018 18:56, "Chamikara Jayalath" a > écrit : > > Agree. We need file-system abstractions in

Re: Configuring file-based transforms with different options

2018-03-12 Thread Chamikara Jayalath
On Mon, Mar 12, 2018 at 12:42 PM Romain Manni-Bucau wrote: > > > Le 12 mars 2018 18:56, "Chamikara Jayalath" a > écrit : > > Agree. We need file-system abstractions in all languages since (1) users > may need to directly access file-systems from

Re: Configuring file-based transforms with different options

2018-03-12 Thread Reuven Lax
I think a way to have transform-specific options could be useful, regardless of this use case. On Mon, Mar 12, 2018 at 12:42 PM Romain Manni-Bucau wrote: > > > Le 12 mars 2018 18:56, "Chamikara Jayalath" a > écrit : > > Agree. We need file-system

Re: Configuring file-based transforms with different options

2018-03-12 Thread Romain Manni-Bucau
Le 12 mars 2018 18:56, "Chamikara Jayalath" a écrit : Agree. We need file-system abstractions in all languages since (1) users may need to directly access file-systems from DoFns (2) common file-based sources/sinks will probably will be available in multiple languages even

Re: Configuring file-based transforms with different options

2018-03-12 Thread Romain Manni-Bucau
Agree and since all languages will support options and strings (didnt check this last one but i hope so ;)) then prefix is by design portable :). Passing directly pipeline options works too but still requires a portable way to read options and requires a way to loosely typed it too without

Re: Configuring file-based transforms with different options

2018-03-12 Thread Chamikara Jayalath
Agree. We need file-system abstractions in all languages since (1) users may need to directly access file-systems from DoFns (2) common file-based sources/sinks will probably will be available in multiple languages even with portability API and cross language IO (these are usually the first

Re: Configuring file-based transforms with different options

2018-03-12 Thread Lukasz Cwik
There is still a lot of work before we get to supporting cross language transforms and hence get access to filesystems written in different languages but how the options are passed through from one to the other will need to be well understood and it would be best if the way a user defines these

Re: Configuring file-based transforms with different options

2018-03-09 Thread Romain Manni-Bucau
Le 9 mars 2018 21:35, "Lukasz Cwik" a écrit : The blocker is to get someone to follow through on the original design or to get a new design (with feedback) and have it implemented. If the pipelineoptionsfactory related pr are merged i can do a pr/proposal bases on this thread

Re: Configuring file-based transforms with different options

2018-03-09 Thread Lukasz Cwik
The blocker is to get someone to follow through on the original design or to get a new design (with feedback) and have it implemented. Note that this impacts more than just Java as it also exists in Python and Go as well. On Fri, Mar 9, 2018 at 12:18 PM, Romain Manni-Bucau

Re: Configuring file-based transforms with different options

2018-03-09 Thread Romain Manni-Bucau
Hmm, it doesnt solve the issue that beam doesnt enable to configure transform from its "config" (let say the cli). So if i have a generic pipeline taking a file as input and another as output then i must register 2 filesystems in all cases? If the pipeline is dynamic i must make it dynamic too?

Re: Configuring file-based transforms with different options

2018-03-09 Thread Jacob Marble
I think when I wrote the S3 code, I couldn't see how to set storage class per-bucket, so put it in a flag. It's easy to imagine a use case where storage class differs per filespec, not only per bucket. Jacob On Fri, Mar 9, 2018 at 9:51 AM, Jacob Marble wrote: > Yes, I

Re: Configuring file-based transforms with different options

2018-03-09 Thread Jacob Marble
Yes, I agree with all of this. Jacob On Thu, Mar 8, 2018 at 9:52 PM, Robert Bradshaw wrote: > On Thu, Mar 8, 2018 at 9:38 PM Eugene Kirpichov > wrote: > >> I think it may have been an API design mistake to put the S3 region into >> PipelineOptions.

Re: Configuring file-based transforms with different options

2018-03-09 Thread Chamikara Jayalath
On Fri, Mar 9, 2018 at 9:24 AM Lukasz Cwik wrote: > Note that TextIO/... internally use FileSystems (Java and Python). > > Based upon the current design where FileSystems is a global concept > (decoupled from PTransforms), having PipelineOptions configure it is a good > and

Re: Configuring file-based transforms with different options

2018-03-09 Thread Lukasz Cwik
Note that TextIO/... internally use FileSystems (Java and Python). Based upon the current design where FileSystems is a global concept (decoupled from PTransforms), having PipelineOptions configure it is a good and valid strategy. Earlier work by Pei He and Daniel Halperin was towards having

Re: Configuring file-based transforms with different options

2018-03-09 Thread Ismaël Mejía
File-based transforms are a little bit different because there is a part of the configuration in the file transform (TextIO.read().foo(), TextIO.write().bar()) and other part done in specific filesystem options. In the example TextIO.from(“...”) does not have a way to do something like

Re: Configuring file-based transforms with different options

2018-03-09 Thread John MacMillan
AWS may not be the only provider, and if you use a different endpoint the API requires a region.   CreateOptions are probably a better place if the pipeline needs to access multiple endpoints or regions, but I suspect the user application is likely to still end up with pipeline options of its own

Re: Configuring file-based transforms with different options

2018-03-08 Thread Robert Bradshaw
On Thu, Mar 8, 2018 at 9:38 PM Eugene Kirpichov wrote: > I think it may have been an API design mistake to put the S3 region into > PipelineOptions. > +1, IMHO it's generally a mistake to put any transform configuration into PipelineOptions for exactly this reason. >

Re: Configuring file-based transforms with different options

2018-03-08 Thread Eugene Kirpichov
I think it may have been an API design mistake to put the S3 region into PipelineOptions. PipelineOptions are global per pipeline, whereas it's totally reasonable to access S3 files in different regions even from the code of a single DoFn running on a single element. The same applies to

Re: Configuring file-based transforms with different options

2018-03-08 Thread Romain Manni-Bucau
The "hint" would probably to use hints :) - indees this joke refers to the hint thread. Long story short with hints you should be able to say "use that specialize config here". Now, personally, I'd like to see a way to specialize config per transform. With an hint an easy way is to use a prefix: