I think this stuff is happening in SparkGroupAlsoByWindowViaWindowSet:
https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkGroupAlsoByWindowViaWindowSet.java#L610
As far as I can tell, there is no infinite stream of pings involved.
How
On Fri, Apr 27, 2018 at 12:06 PM Robert Bradshaw
wrote:
> On Fri, Apr 27, 2018 at 11:56 AM Kenneth Knowles wrote:
>
> > I'm still pretty shallow on this topic & this thread, so forgive if I'm
> restating or missing things.
>
> > My understanding is that the Spark runner does support Beam's trigg
On Fri, Apr 27, 2018 at 11:56 AM Kenneth Knowles wrote:
> I'm still pretty shallow on this topic & this thread, so forgive if I'm
restating or missing things.
> My understanding is that the Spark runner does support Beam's triggering
semantics for unbounded aggregations, using the same support c
I'm still pretty shallow on this topic & this thread, so forgive if I'm
restating or missing things.
My understanding is that the Spark runner does support Beam's triggering
semantics for unbounded aggregations, using the same support code from
runners/core that all runners use. Relevant code in S
Yeah that's been the implied source of being able to be continuous, you
union with a receiver which produce an infinite number of batches (the
"never ending queue stream" but not actually a queuestream since they have
some limitations but our own implementation there of).
On Tue, Apr 24, 2018 at 1
Could we do this behind the scenes by writing a Receiver that publishes
periodic pings?
On Tue, Apr 24, 2018 at 10:09 PM Eugene Kirpichov
wrote:
> Kenn - I'm arguing that in Spark SDF style computation can not be
> expressed at all, and neither can Beam's timers.
>
> Spark, unlike Flink, does no
Kenn - I'm arguing that in Spark SDF style computation can not be expressed
at all, and neither can Beam's timers.
Spark, unlike Flink, does not have a timer facility (only state), and as
far as I can tell its programming model has no other primitive that can map
a finite RDD into an infinite DStr
I don't think I understand what the limitations of timers are that you are
referring to. FWIW I would say implementing other primitives like SDF is an
explicit non-goal for Beam state & timers.
I got lost at some point in this thread, but is it actually necessary that
a bounded PCollection maps to
Would like to revive this thread one more time.
At this point I'm pretty certain that Spark can't support this out of the
box and we're gonna have to make changes to Spark.
Holden, could you advise who would be some Spark experts (yourself included
:) ) who could advise what kind of Spark change
(resurrecting thread as I'm back from leave)
I looked at this mode, and indeed as Reuven points out it seems that it
affects execution details, but doesn't offer any new APIs.
Holden - your suggestions of piggybacking an unbounded-per-element SDF on
top of an infinite stream would work if 1) there
I mean the new mode is very much in the Dataset not the DStream API
(although you can use the Dataset API with the old modes too).
On Sun, Mar 25, 2018 at 9:11 PM, Reuven Lax wrote:
> But this new mode isn't a semantic change, right? It's moving away from
> micro batches into something that look
But this new mode isn't a semantic change, right? It's moving away from
micro batches into something that looks a lot like what Flink does -
continuous processing with asynchronous snapshot boundaries.
On Sun, Mar 25, 2018 at 9:01 PM Thomas Weise wrote:
> Hopefully the new "continuous processing
That would certainly be good.
On Sun, Mar 25, 2018 at 9:01 PM, Thomas Weise wrote:
> Hopefully the new "continuous processing mode" in Spark will enable SDF
> implementation (and real streaming)?
>
> Thanks,
> Thomas
>
>
> On Sat, Mar 24, 2018 at 3:22 PM, Holden Karau
> wrote:
>
>>
>> On Sat, M
Hopefully the new "continuous processing mode" in Spark will enable SDF
implementation (and real streaming)?
Thanks,
Thomas
On Sat, Mar 24, 2018 at 3:22 PM, Holden Karau wrote:
>
> On Sat, Mar 24, 2018 at 1:23 PM Eugene Kirpichov
> wrote:
>
>>
>>
>> On Fri, Mar 23, 2018, 11:17 PM Holden Karau
On Sat, Mar 24, 2018 at 1:23 PM Eugene Kirpichov
wrote:
>
>
> On Fri, Mar 23, 2018, 11:17 PM Holden Karau wrote:
>
>> On Fri, Mar 23, 2018 at 7:00 PM Eugene Kirpichov
>> wrote:
>>
>>> On Fri, Mar 23, 2018 at 6:49 PM Holden Karau
>>> wrote:
>>>
On Fri, Mar 23, 2018 at 6:20 PM Eugene Kirpic
On Fri, Mar 23, 2018, 11:17 PM Holden Karau wrote:
> On Fri, Mar 23, 2018 at 7:00 PM Eugene Kirpichov
> wrote:
>
>> On Fri, Mar 23, 2018 at 6:49 PM Holden Karau
>> wrote:
>>
>>> On Fri, Mar 23, 2018 at 6:20 PM Eugene Kirpichov
>>> wrote:
>>>
On Fri, Mar 23, 2018 at 6:12 PM Holden Karau
>
On Fri, Mar 23, 2018 at 7:00 PM Eugene Kirpichov
wrote:
> On Fri, Mar 23, 2018 at 6:49 PM Holden Karau wrote:
>
>> On Fri, Mar 23, 2018 at 6:20 PM Eugene Kirpichov
>> wrote:
>>
>>> On Fri, Mar 23, 2018 at 6:12 PM Holden Karau
>>> wrote:
>>>
On Fri, Mar 23, 2018 at 5:58 PM Eugene Kirpichov
On Fri, Mar 23, 2018 at 6:49 PM Holden Karau wrote:
> On Fri, Mar 23, 2018 at 6:20 PM Eugene Kirpichov
> wrote:
>
>> On Fri, Mar 23, 2018 at 6:12 PM Holden Karau
>> wrote:
>>
>>> On Fri, Mar 23, 2018 at 5:58 PM Eugene Kirpichov
>>> wrote:
>>>
Reviving this thread. I think SDF is a pretty
On Fri, Mar 23, 2018 at 6:20 PM Eugene Kirpichov
wrote:
> On Fri, Mar 23, 2018 at 6:12 PM Holden Karau wrote:
>
>> On Fri, Mar 23, 2018 at 5:58 PM Eugene Kirpichov
>> wrote:
>>
>>> Reviving this thread. I think SDF is a pretty big risk for Spark runner
>>> streaming. Holden, is it correct that
On Fri, Mar 23, 2018 at 6:12 PM Holden Karau wrote:
> On Fri, Mar 23, 2018 at 5:58 PM Eugene Kirpichov
> wrote:
>
>> Reviving this thread. I think SDF is a pretty big risk for Spark runner
>> streaming. Holden, is it correct that Spark appears to have no way at all
>> to produce an infinite DStr
On Fri, Mar 23, 2018 at 5:58 PM Eugene Kirpichov
wrote:
> Reviving this thread. I think SDF is a pretty big risk for Spark runner
> streaming. Holden, is it correct that Spark appears to have no way at all
> to produce an infinite DStream from a finite RDD? Maybe we can somehow
> dynamically crea
Reviving this thread. I think SDF is a pretty big risk for Spark runner
streaming. Holden, is it correct that Spark appears to have no way at all
to produce an infinite DStream from a finite RDD? Maybe we can somehow
dynamically create a new DStream for every initial restriction, said
DStream being
How would timers be implemented? By outputing and reprocessing, the same
way you proposed for SDF?
On Wed, Mar 14, 2018 at 7:25 PM Holden Karau wrote:
> So the timers would have to be in our own code.
>
> On Wed, Mar 14, 2018 at 5:18 PM Eugene Kirpichov
> wrote:
>
>> Does Spark have support fo
So the timers would have to be in our own code.
On Wed, Mar 14, 2018 at 5:18 PM Eugene Kirpichov
wrote:
> Does Spark have support for timers? (I know it has support for state)
>
> On Wed, Mar 14, 2018 at 4:43 PM Reuven Lax wrote:
>
>> Could we alternatively use a state mapping function to keep
Does Spark have support for timers? (I know it has support for state)
On Wed, Mar 14, 2018 at 4:43 PM Reuven Lax wrote:
> Could we alternatively use a state mapping function to keep track of the
> computation so far instead of outputting V each time? (also the progress so
> far is probably of a
Could we alternatively use a state mapping function to keep track of the
computation so far instead of outputting V each time? (also the progress so
far is probably of a different type R rather than V).
On Wed, Mar 14, 2018 at 4:28 PM Holden Karau wrote:
> So we had a quick chat about what it w
So we had a quick chat about what it would take to add something like
SplittableDoFns to Spark. I'd done some sketchy thinking about this last
year but didn't get very far.
My back-of-the-envelope design was as follows:
For input type T
Output type V
Implement a mapper which outputs type (T, V)
a
27 matches
Mail list logo