Re: [DISCUSSION] Add hint/option on PCollection

2018-01-31 Thread Jean-Baptiste Onofré
Hi Reuven, it's also what I did in JdbcIO for the statement or column mapper. That's fair enough. Regards JB On 01/31/2018 01:35 PM, Reuven Lax wrote: > > > On Wed, Jan 31, 2018 at 1:34 AM, Jean-Baptiste Onofré > wrote: > > Hi Ismaël, > > I agree that hint

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-31 Thread Reuven Lax
On Wed, Jan 31, 2018 at 1:34 AM, Jean-Baptiste Onofré wrote: > Hi Ismaël, > > I agree that hint should not change the output of PTransforms. > > However, let me illustrate why I think hint could be interesting: > > - I agree with what you are saying about the runners: they should be smart. > Howe

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-31 Thread Jean-Baptiste Onofré
Hi Ismaël, I agree that hint should not change the output of PTransforms. However, let me illustrate why I think hint could be interesting: - I agree with what you are saying about the runners: they should be smart. However, to be smart enough, the runner could use some statements provided by pi

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-31 Thread Romain Manni-Bucau
Can we avoid it anyway? Not having it make the migration away from beam very tempting since the runtime diff can be important in terms of perf. What about: 1. adding hints as @Experimental 2. see how it grow for some releases (like 6 months) 3. take a decision to keep that or drop it Whatever you

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-31 Thread Ismaël Mejía
This is a subject we have already discussed in the past. It was part on the discussion on ‘data locality’ for the runners on top of HDFS. In that moment the argument for ‘hints’ was that a transform could send hints to the runners so they properly allocate the readers improving its execution. This

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Jean-Baptiste Onofré
Hi, yeah, it sounds good to me. I will create the Jira to track this and start a PoC on the Composite. Thanks ! Regards JB On 01/30/2018 10:40 PM, Reuven Lax wrote: > Did we actually reach consensus here? :) > > On Tue, Jan 30, 2018 at 1:29 PM, Romain Manni-Bucau

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Reuven Lax
Did we actually reach consensus here? :) On Tue, Jan 30, 2018 at 1:29 PM, Romain Manni-Bucau wrote: > Not sure how it fits in terms of API yet but +1 for the high level view. > Makes perfect sense. > > Le 30 janv. 2018 21:41, "Jean-Baptiste Onofré" a écrit : > >> Hi Robert, >> >> Good point and

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Romain Manni-Bucau
Not sure how it fits in terms of API yet but +1 for the high level view. Makes perfect sense. Le 30 janv. 2018 21:41, "Jean-Baptiste Onofré" a écrit : > Hi Robert, > > Good point and idea for the Composite transform. It would apply nicely on > all transforms based on composite. > > I also agree

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Jean-Baptiste Onofré
Hi Robert, Good point and idea for the Composite transform. It would apply nicely on all transforms based on composite. I also agree that the hint is more on the transform than the PCollection itself. Thanks ! Regards JB On 30/01/2018 21:26, Robert Bradshaw wrote: Many hints make more sen

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Reuven Lax
My point was that hints are added on transforms by default, so you can simply add them to originalTransform. The AddHint transform was for the case where you want the hint on the PCollection itself; it provides a way to do so, while keeping the PCollection immutable. On Tue, Jan 30, 2018 at 11:59

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Robert Bradshaw
Many hints make more sense for PTransforms (the computation itself) than for PCollections. In addition, when we want properties attached to PCollections of themselves, it often makes sense to let these be provided by the producing PTransform (e.g. coders and schemas are often functions of the input

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Kenneth Knowles
It seems like most of these use cases are hints on a PTransform and not a PCollection, no? CPU, memory, expected parallelism, etc are. Then you could just have: pc.apply(WithHints(myTransform, )) For a PCollection hints that might make sense are bits like total size, element size, and throughp

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Jean-Baptiste Onofré
Yes, agree, it sounds like Create.of() but actually it adding hint to the collection. So maybe AddHints.on(collection, hint1, ...) it's clearer. Regards JB On 30/01/2018 21:08, Romain Manni-Bucau wrote: I think so too but `pc.apply(AddHints.of(hint1, hint2, hint3))` is a bit ambiguous for me (

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Romain Manni-Bucau
I think so too but `pc.apply(AddHints.of(hint1, hint2, hint3))` is a bit ambiguous for me (is it affecting the previous collection?) Maybe AddHints.on(collection, hint1, hint2, ...) is an acceptable compromise? Less fluent but not ambiguous (based on the same pattern as views). Romain Manni-Buca

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Reuven Lax
I think the hints would logically be metadata in the pcollection, just like coder and schema. On Jan 30, 2018 11:57 AM, "Jean-Baptiste Onofré" wrote: > Great idea for AddHints.of() ! > > What would be the resulting PCollection ? Just a PCollection of hints or > the pc elements + hints ? > > Rega

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Romain Manni-Bucau
Hmm, can work for pipeline hints but for transform hints we would need: p.apply(AddHint.of(.).wrap(originalTransform)) Would work for me too. Romain Manni-Bucau @rmannibucau | Blog | Old Blog

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Jean-Baptiste Onofré
Great idea for AddHints.of() ! What would be the resulting PCollection ? Just a PCollection of hints or the pc elements + hints ? Regards JB On 30/01/2018 20:52, Reuven Lax wrote: I think adding hints for runners is reasonable, though hints should always be assumed to be optional - they shou

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Reuven Lax
I think adding hints for runners is reasonable, though hints should always be assumed to be optional - they shouldn't change semantics of the program (otherwise you destroy the portability promise of Beam). However there are many types of hints that some runners might find useful (e.g. this step ne

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Jean-Baptiste Onofré
Maybe I should have started the discussion on the user mailing list: it would be great to have user feedback on this, even if I got your points. Sometime, I have the feeling that whatever we are proposing and discussing, it doesn't go anywhere. At some point, to attract more people, we have to

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Romain Manni-Bucau
2018-01-30 19:52 GMT+01:00 Kenneth Knowles : > I generally like having certain "escape hatches" that are well designed > and limited in scope, and anything that turns out to be important becomes > first-class. But this one I don't really like because the use cases belong > elsewhere. Of course, th

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Kenneth Knowles
I generally like having certain "escape hatches" that are well designed and limited in scope, and anything that turns out to be important becomes first-class. But this one I don't really like because the use cases belong elsewhere. Of course, they creep so you should assume they will be unbounded i

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Romain Manni-Bucau
Lukasz, the point is that you have to choice to either bring all specificities to the main API which makes most of the API not usable or implemented or the opposite, not support anything. Introducing hints will allow to have eagerly for some runners some features - or just some very specific things

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Lukasz Cwik
There have been suggestions in the past for Dataflow 1.x to extend PipelineOptions to be usable per PTransform. So when you apply a PTransform you can also provide a set of options that apply to it and all subtransforms contained within. This is the closest suggestion to what your describing that I

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Jean-Baptiste Onofré
Good point Luke: in that case, the hint will be ignored by the runner if the hint is not for him. The hint can be generic (not specific to a runner). It could be interesting for the schema support or IOs, not specific to a runner. What do you mean by gathering PTransforms/PCollections and wher

Re: [DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Lukasz Cwik
If the hint is required to run the persons pipeline well, how do you expect that the person we be able to migrate their pipeline to another runner? A lot of hints like "spark.persist" are really the user trying to tell us something about the PCollection, like it is very small. I would prefer if we

[DISCUSSION] Add hint/option on PCollection

2018-01-30 Thread Jean-Baptiste Onofré
Hi, As part of the discussion about schema, Romain mentioned hint. I think it's worth to have an explanation about that and especially it could be wider than schema. Today, to give information to the runner, we use PipelineOptions. The runner can use these options, and apply for all inner represe