Re: Rework of the window-join semantics

2015-04-24 Thread Aljoscha Krettek
There is a simple reason for that: They don't support joins. :D They support n-ary co-group, however. This is implemented using tagging and a group-by-key operation. So only elements in the same window can end up in the same co-grouped result. On Fri, Apr 24, 2015 at 3:51 PM, Matthias J. Sax wro

Re: Rework of the window-join semantics

2015-04-24 Thread Matthias J. Sax
Interesting read. Thanks for the pointer. Take home message (in my understanding): - they support wall-clock, attribute-ts, and count windows -> default is attribute-ts (and not wall-clock as in Flink) -> it is not specified, if a global order is applied to windows, but I doubt it, because o

Re: Rework of the window-join semantics

2015-04-24 Thread Aljoscha Krettek
Did anyone read these: https://cloud.google.com/dataflow/model/windowing, https://cloud.google.com/dataflow/model/triggers ? The semantics seem very straightforward and I'm sure the google guys spent some time thinking this through. :D On Mon, Apr 20, 2015 at 3:43 PM, Stephan Ewen wrote: > Perfe

Re: Rework of the window-join semantics

2015-04-20 Thread Stephan Ewen
Perfect! I am eager to see what you came up with! On Sat, Apr 18, 2015 at 2:00 PM, Gyula Fóra wrote: > Hey all, > > We have spent some time with Asterios, Paris and Jonas to finalize the > windowing semantics (both the current features and the window join), and I > think we made very have come u

Re: Rework of the window-join semantics

2015-04-18 Thread Gyula Fóra
Hey all, We have spent some time with Asterios, Paris and Jonas to finalize the windowing semantics (both the current features and the window join), and I think we made very have come up with a very clear picture. We will write down the proposed semantics and publish it to the wiki next week. Ch

Re: Rework of the window-join semantics

2015-04-16 Thread Asterios Katsifodimos
As far as I see in [1], Peter's/Gyula's suggestion is what Infosphere Streams does: symmetric hash join. >From [1]: "When a tuple is received on an input port, it is inserted into the window corresponding to the input port, which causes the window to trigger. As part of the trigger processing, the

Re: Rework of the window-join semantics

2015-04-09 Thread Matthias J. Sax
Hi Paris, thanks for the pointer to the Naiad paper. That is quite interesting. The paper I mentioned [1], does not describe the semantics in detail; it is more about the implementation for the stream-joins. However, it uses the same semantics (from my understanding) as proposed by Gyula. -Matth

Re: Rework of the window-join semantics

2015-04-08 Thread Matthias J. Sax
I started to work on an in-memory merge on a record-timestamp attribute for total ordered streams. But I got distracted by the Storm compatibility layer... I will continue to work on it, when I find some extra time ;) On 04/08/2015 03:18 PM, Márton Balassi wrote: > +1 for Stephan's suggestion. >

Re: Rework of the window-join semantics

2015-04-08 Thread Márton Balassi
+1 for Stephan's suggestion. If we would like to support event time and also sorting inside a window we should carefully consider where to actually put the timestamp of the records. If the timestamp is part of the record then it is more straight-forward, but in case of we assign the timestamps in

Re: Rework of the window-join semantics

2015-04-08 Thread Bruno Cadonna
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Stephan, that sounds reasonable to me. Cheers, Bruno On 08.04.2015 15:06, Stephan Ewen wrote: > With the current network layer and the agenda we have for > windowing, we should be able to support widows on event time this > in the near future. In

Re: Rework of the window-join semantics

2015-04-08 Thread Stephan Ewen
With the current network layer and the agenda we have for windowing, we should be able to support widows on event time this in the near future. Inside the window, you can sort all records by time and have a full ordering. That is independent of the order of the stream. How about this as a first go

Re: Rework of the window-join semantics

2015-04-08 Thread Bruno Cadonna
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Stephan, how much of CEP depends on fully ordered streams depends on the operators that you use in the pattern query. But in general, they need fully ordered events within a window or at least some strategies to deal with out-of-order events. If I

Re: Rework of the window-join semantics

2015-04-08 Thread Stephan Ewen
I agree, any ordering guarantees would need to be actively enabled. How much of CEP depends on fully ordered streams? There is a lot you can do with windows on event time, which are triggered by punctuations. This is like a "soft" variant of the ordered streams, where order relation occurs only w

Re: Rework of the window-join semantics

2015-04-08 Thread Matthias J. Sax
This reasoning makes absolutely sense. That's why I suggested, that the user should actively choose ordered data processing... About deadlocks: Those can be avoided, if the buffers are consumed continuously in an in-memory merge buffer (maybe with spilling to disk if necessary). Of course, latency

Re: Rework of the window-join semantics

2015-04-08 Thread Stephan Ewen
Here is the state in Flink and why we have chosen not to do global ordering at the moment: - Individual streams are FIFO, that means if the sender emits in order, the receiver receives in order. - When streams are merged (think shuffle / partition-by), then the streams are not merged, but buffe

Re: Rework of the window-join semantics

2015-04-08 Thread Bruno Cadonna
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Paris, what's the reason for not guaranteeing global ordering across partitions in the stream model? Is it the smaller overhead or are there any operations not computable in a distributed environment with global ordering? In any case, I agree with

Re: Rework of the window-join semantics

2015-04-07 Thread Paris Carbone
Hello Matthias, Sure, ordering guarantees are indeed a tricky thing, I recall having that discussion back in TU Berlin. Bear in mind thought that DataStream, our abstract data type, represents a *partitioned* unbounded sequence of events. There are no *global* ordering guarantees made whatsoeve

Re: Rework of the window-join semantics

2015-04-07 Thread Matthias J. Sax
Hi @all, please keep me in the loop for this work. I am highly interested and I want to help on it. My initial thoughts are as follows: 1) Currently, system timestamps are used and the suggested approach can be seen as state-of-the-art (there is actually a research paper using the exact same jo

Re: Rework of the window-join semantics

2015-04-07 Thread Gyula Fóra
Hey, I agree with Kostas, if we define the exact semantics how this works, this is not more ad-hoc than any other stateful operator with multiple inputs. (And I don't think any other system support something similar) We need to make some design choices that are similar to the issues we had for wi

Re: Rework of the window-join semantics

2015-04-07 Thread Kostas Tzoumas
Yes, we should write these semantics down. I volunteer to help. I don't think that this is very ad-hoc. The semantics are basically the following. Assuming an arriving element from the left side: (1) We find the right-side matches (2) We insert the left-side arrival into the left window (3) We rec

Re: Rework of the window-join semantics

2015-04-07 Thread Stephan Ewen
Is the approach of joining an element at a time from one input against a window on the other input not a bit arbitrary? This just joins whatever currently happens to be the window by the time the single element arrives - that is a bit non-predictable, right? As a more general point: The whole sem

Re: Rework of the window-join semantics

2015-04-03 Thread Gyula Fóra
I think it should be possible to make this compatible with the .window().every() calls. Maybe if there is some trigger set in "every" we would not join that stream 1 by 1 but every so many elements. The problem here is that the window and every in this case are very-very different than the normal w

Re: Rework of the window-join semantics

2015-04-03 Thread Márton Balassi
That would be really neat, the problem I see there, that we do not distinguish between dataStream.window() and dataStream.window().every() currently, they both return WindowedDataStreams and TriggerPolicies of the every call do not make much sense in this setting (in fact practically the trigger is

Re: Rework of the window-join semantics

2015-04-02 Thread Aljoscha Krettek
Or you could define it like this: stream_A = a.window(...) stream_B = b.window(...) stream_A.join(stream_B).where().equals().with() So a join would just be a join of two WindowedDataStreamS. This would neatly move the windowing stuff into one place. On Thu, Apr 2, 2015 at 9:54 PM, Márton Balass

Re: Rework of the window-join semantics

2015-04-02 Thread Márton Balassi
Big +1 for the proposal for Peter and Gyula. I'm really for bringing the windowing and window join API in sync. On Thu, Apr 2, 2015 at 6:32 PM, Gyula Fóra wrote: > Hey guys, > > As Aljoscha has highlighted earlier the current window join semantics in > the streaming api doesn't follow the change

Rework of the window-join semantics

2015-04-02 Thread Gyula Fóra
Hey guys, As Aljoscha has highlighted earlier the current window join semantics in the streaming api doesn't follow the changes in the windowing api. More precisely, we currently only support joins over time windows of equal size on both streams. The reason for this is that we now take a window of