Hi Jark and Benchao,
There are three more weird things about the pv uv in Flink SQL.
As I described in the above email, I computed the pv uv in two method, I
list them below:
For the day grouping one, the sql is
> insert into pvuv_sink
> select a,v,MAX(DATE_FORMAT(ts, '-MM-dd HH:mm:00'))
Hi Jark and Benchao
I have learned from your previous email on how to do pv/uv in flink sql.
One is to make a MMdd grouping, the other is to make a day window.
Thank you all.
I have a question about the result output. For MMdd grouping, every
minute the database would get a record, and
@Benchao @Jark
thank you very much. We have use flink 1.9 for a while , and we will try 1.9 +
minibatch.
dixingxin...@163.com
Sender: Jark Wu
Send Time: 2020-04-18 21:38
Receiver: Benchao Li
cc: dixingxing85; user; user-zh
Subject: Re: Flink streaming sql是否支持两层group by聚合
Hi,
I will use
Hi,
I will use English because we are also sending to user@ ML.
This behavior is as expected, not a bug. Benchao gave a good explanation
about the reason. I will give some further explanation.
In Flink SQL, we will split an update operation (such as uv from 100 ->
101) into two separate
这个按照目前的设计,应该不能算是bug,应该是by desigh的。
主要问题还是因为有两层agg,第一层的agg的retract会导致第二层的agg重新计算和下发结果。
dixingxing85 于2020年4月18日 周六上午11:38写道:
> 多谢benchao,
> 我这个作业的结果预期结果是每天只有一个结果,这个结果应该是越来越大的,比如:
> 20200417,86
> 20200417,90
> 20200417,130
> 20200417,131
>
> 而不应该是忽大忽小的,数字由大变小,这样的结果需求方肯定不能接受的:
> 20200417,90
>
多谢benchao,
我这个作业的结果预期结果是每天只有一个结果,这个结果应该是越来越大的,比如:
20200417,86
20200417,90
20200417,130
20200417,131
而不应该是忽大忽小的,数字由大变小,这样的结果需求方肯定不能接受的:
20200417,90
20200417,86
20200417,130
20200417,86
20200417,131
我的疑问是内层的group by产生的retract流,会影响sink吗,我是在sink端打的日志。
如果flink支持这种两层group by的话,那这种结果变小的情况应该算是bug吧?
Hi,
这个是支持的哈。
你看到的现象是因为group by会产生retract结果,也就是会先发送-[old],再发送+[new].
如果是两层的话,就成了:
第一层-[old], 第二层-[cur], +[old]
第一层+[new], 第二层[-old], +[new]
dixingxin...@163.com 于2020年4月18日周六 上午2:11写道:
>
> Hi all:
>
> 我们有个streaming sql得到的结果不正确,现象是sink得到的数据一会大一会小,*我们想确认下,这是否是个bug,
> 或者flink还不支持这种sql*。
>
Hi all:
我们有个streaming sql得到的结果不正确,现象是sink得到的数据一会大一会小,我们想确认下,这是否是个bug, 或者flink还不支持这种sql。
具体场景是:先group by A, B两个维度计算UV,然后再group by A 把维度B的UV sum起来,对应的SQL如下:(A -> dt, B
-> pvareaid)
SELECT dt, SUM(a.uv) AS uv
FROM (
SELECT dt, pvareaid, COUNT(DISTINCT cuid) AS uv
FROM streaming_log_event