Unexpected hop start & end timestamps after stream SQL join

2018-02-14 Thread Juho Autio
I'm joining a tumbling & hopping window in Flink 1.5-SNAPSHOT. The result is unexpected. Am I doing something wrong? Maybe this is just not a supported join type at all? Any way here goes: I first register these two tables: 1. new_ids: a tumbling window of seen ids within the last 10 seconds: SE

Re: Unexpected hop start & end timestamps after stream SQL join

2018-02-19 Thread Fabian Hueske
Hi Juho, sorry for the late response. I found time to look into this issue. I agree, that the start and end timestamps of the HOP window should be 1 hour apart from each other. I tried to reproduce the issue, but was not able to do so. Can you maybe open a JIRA and provide a simple test case (coll

Re: Unexpected hop start & end timestamps after stream SQL join

2018-02-27 Thread Juho Autio
Thanks for the hint! For some reason it isn't catching all distinct values (even though it's a much simpler way than what I initially tried and seems good in that sense). First of all, isn't this like a sliding window: "rowtime RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW"? My use cas

Re: Unexpected hop start & end timestamps after stream SQL join

2018-02-27 Thread Fabian Hueske
Hi Juho, a query with an OVER aggregation should emit exactly one row for each input row. Does your comment on "isn't catching all distinct values" mean that this is not the case? You can combine tumbling windows and over aggregates also by nesting queries as shown below: SELECT s_aid1, s_ci

Re: Unexpected hop start & end timestamps after stream SQL join

2018-02-27 Thread Juho Autio
> a query with an OVER aggregation should emit exactly one row for each input row. > Does your comment on "isn't catching all distinct values" mean that this is not the case? Not really what I meant? The problem is that some ids are not received at all for some time windows. I did this as you sug

Re: Unexpected hop start & end timestamps after stream SQL join

2018-02-27 Thread Juho Autio
Actually looks like I found why the "count(*) AS occurrence" + filter "occurrence = 1" doesn't work. If there are multiple events with the same event time, they get handled together and share the value for count(*). I printed out some rows before the filter* and this is what I get: 4> {"s_aid1":"A

Re: Unexpected hop start & end timestamps after stream SQL join

2018-03-01 Thread Fabian Hueske
Hi Juho, I have to admit I lost a bit track of what you are trying to compute. I also don't understand the problem with the missing ids. The query that you shared in the last mail will return for each record with a valid s_aid1, s_cid combination how often the id combination has been seen so far