Re: What is the future of Reshuffle?

2018-05-21 Thread Raghu Angadi
Filed https://issues.apache.org/jira/browse/BEAM-4372 (unassigned). On Mon, May 21, 2018 at 10:22 AM Raghu Angadi wrote: > > > On Mon, May 21, 2018 at 9:56 AM Robert Bradshaw > wrote: > >> We should probably keep the warning and all the caveats until we

Re: What is the future of Reshuffle?

2018-05-21 Thread Raghu Angadi
On Mon, May 21, 2018 at 9:56 AM Robert Bradshaw wrote: > We should probably keep the warning and all the caveats until we introduce > the alternative (and migrate to it for the non-parallelism uses of > reshuffle). I was just proposing we do this via a separate transform

Re: What is the future of Reshuffle?

2018-05-21 Thread Ben Chambers
+1 to introducing alternative transforms even if they wrap Reshuffle The benefits of making them distinct is that we can put appropriate Javadoc in place and runners can figure out what the user is intending and whether Reshuffle or some other implementation is appropriate. We can also see which

Re: What is the future of Reshuffle?

2018-05-21 Thread Robert Bradshaw
We should probably keep the warning and all the caveats until we introduce the alternative (and migrate to it for the non-parallelism uses of reshuffle). I was just proposing we do this via a separate transform that just calls Reshuffle until we have the new story fully fleshed out (I don't know

Re: What is the future of Reshuffle?

2018-05-20 Thread Raghu Angadi
On Sat, May 19, 2018 at 10:55 PM Robert Bradshaw wrote: > On Sat, May 19, 2018 at 6:27 PM Raghu Angadi wrote: > >> [...] >> > I think it would be much more user friendly to un-deprecate it to add a >> warning for advanced users about non-portability of

Re: What is the future of Reshuffle?

2018-05-20 Thread Reuven Lax
On Sat, May 19, 2018 at 10:55 PM Robert Bradshaw wrote: > On Sat, May 19, 2018 at 6:27 PM Raghu Angadi wrote: > >> On Sat, May 19, 2018 at 8:11 AM Robert Bradshaw >> wrote: >> >>> On Fri, May 18, 2018 at 6:29 PM Raghu Angadi

Re: What is the future of Reshuffle?

2018-05-19 Thread Robert Bradshaw
On Sat, May 19, 2018 at 6:27 PM Raghu Angadi wrote: > On Sat, May 19, 2018 at 8:11 AM Robert Bradshaw > wrote: > >> On Fri, May 18, 2018 at 6:29 PM Raghu Angadi wrote: >> >>> True. I am still failing to see what is broken about

Re: What is the future of Reshuffle?

2018-05-19 Thread Raghu Angadi
On Sat, May 19, 2018 at 8:11 AM Robert Bradshaw wrote: > On Fri, May 18, 2018 at 6:29 PM Raghu Angadi wrote: > >> True. I am still failing to see what is broken about Reshuffle that is >> also not broken with GroupByKey transform. If someone depends on

Re: What is the future of Reshuffle?

2018-05-19 Thread Robert Bradshaw
On Fri, May 18, 2018 at 6:29 PM Raghu Angadi wrote: > > On Fri, May 18, 2018 at 5:34 PM Robert Bradshaw > wrote: > >> Ah, thanks, that makes sense. That implies to me Reshuffle is no more >>> broken than GBK itself. May be Reshuffle.viaRandomKey() could

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
On Fri, May 18, 2018 at 5:34 PM Robert Bradshaw wrote: > Ah, thanks, that makes sense. That implies to me Reshuffle is no more >> broken than GBK itself. May be Reshuffle.viaRandomKey() could have a clear >> caveat. Reshuffle's JavaDoc could add a caveat too about

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
On Fri, May 18, 2018 at 4:07 PM Kenneth Knowles wrote: > It isn't any particular logic in Reshuffle - it is, semantically, an > identity transform. It is the fact that other runners are perfectly able to > re-run transform prior to a GBK. So, for example, randomly generated IDs

Re: What is the future of Reshuffle?

2018-05-18 Thread Kenneth Knowles
It isn't any particular logic in Reshuffle - it is, semantically, an identity transform. It is the fact that other runners are perfectly able to re-run transform prior to a GBK. So, for example, randomly generated IDs will be re-generated. We tend to put in reshuffles in order to "commit" these

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
On Fri, May 18, 2018 at 12:22 PM Robert Bradshaw wrote: > [resending] > Agreed that keeping this deprecated without a clear replacement for so long > is not ideal. > > I would at least break this into two separate transforms, the > parallelism-breaking one (which seems OK)

Re: What is the future of Reshuffle?

2018-05-18 Thread Robert Bradshaw
[resending] Agreed that keeping this deprecated without a clear replacement for so long is not ideal. I would at least break this into two separate transforms, the parallelism-breaking one (which seems OK) and the stable input one (which may just call the parallelism-breaking one, but should be

Re: What is the future of Reshuffle?

2018-05-18 Thread Robert Bradshaw
On Fri, May 18, 2018 at 11:46 AM Raghu Angadi wrote: > Thanks Kenn. > > On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles wrote: > >> The fact that its usage has grown probably indicates that we have a large >> number of transforms that can easily cause data

Re: What is the future of Reshuffle?

2018-05-18 Thread Robert Bradshaw
On Fri, May 18, 2018 at 11:46 AM Raghu Angadi rang...@google.com wrote: Thanks Kenn. On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles k...@google.com wrote: The fact that its usage has grown probably indicates that we have a large number of transforms that can easily cause data loss / duplication.

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
Thanks Kenn. On Fri, May 18, 2018 at 11:02 AM Kenneth Knowles wrote: > The fact that its usage has grown probably indicates that we have a large > number of transforms that can easily cause data loss / duplication. > Is this specific to Reshuffle or it is true for any

Re: What is the future of Reshuffle?

2018-05-18 Thread Kenneth Knowles
The fact that its usage has grown probably indicates that we have a large number of transforms that can easily cause data loss / duplication. Yes, it is deprecated because it is primarily used as a Dataflow-specific way to ensure stable input. My understanding is that the SparkRunner also

Re: What is the future of Reshuffle?

2018-05-18 Thread Eugene Kirpichov
Agreed that it should be undeprecated, many users are getting confused by this. I know that some people are working on a replacement for at least one of its use cases (RequiresStableInput), but the use case of breaking fusion is, as of yet, unaddressed, and there's not much to be gained by keeping

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
I am interested in more clarity on this as well. It has been deprecated for a long time without a replacement, and its usage has only grown, both within Beam code base as well as in user applications. If we are certain that it will not be removed before there is a good replacement for it, can we

What is the future of Reshuffle?

2018-05-18 Thread Ismaël Mejía
I saw in a recent thread that the use of the Reshuffle transform was recommended to solve an user issue: https://lists.apache.org/thread.html/87ef575ac67948868648e0a8110be242f811bfff8fdaa7f9b758b933@%3Cdev.beam.apache.org%3E I can see why it may fix the reported issue. I am just curious about