Hi,guomuhua
`The number of inputs accumulated by local aggregation every time is
based on mini-batch interval. It means local-global aggregation depends on
mini-batch optimization is enabled `
,关于本地聚合,官网有这么一段话,也就是说,需要先开启批次聚合,然后才能使用本地聚合,加起来有三个参数.
configuration.setString("table.exec.mini-batch.all
Hi,Jark
我理解疑问中的sql是一个普通的agg操作,只不过分组的键是时间字段,不知道您说的 `我看你的作业里面是window agg`
,这个怎么理解
Best,
Robin
Jark wrote
>> 如果不是window agg,开启参数后flink会自动打散是吧
> 是的
>
>> 那关于window agg, 不能自动打散,这部分的介绍,在文档中可以找到吗?
> 文档中没有说明。 这个文档[1] 里说地都是针对 unbounded agg 的优化。
>
> Best,
> Jark
>
> [1]:
> https://ci.apache.org/
> 如果不是window agg,开启参数后flink会自动打散是吧
是的
> 那关于window agg, 不能自动打散,这部分的介绍,在文档中可以找到吗?
文档中没有说明。 这个文档[1] 里说地都是针对 unbounded agg 的优化。
Best,
Jark
[1]:
https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/tuning/streaming_aggregation_optimization.html#split-distinct-aggregation
On Fri, 26
Jark wrote
> 我看你的作业里面是window agg,目前 window agg 还不支持自动拆分。1.13 的基于 window tvf 的 window
> agg支持这个参数了。可以期待下。
>
> Best,
> Jark
>
> On Wed, 24 Mar 2021 at 19:29, Robin Zhang <
> vincent2015qdlg@
> >
> wrote:
>
>> Hi,guomuhua
>> 开启本地聚合,是不需要自己打散进行二次聚合的哈,建议看看官方的文档介绍。
>>
>> Best,
>> Robin
>>
>>
>>
我看你的作业里面是window agg,目前 window agg 还不支持自动拆分。1.13 的基于 window tvf 的 window
agg支持这个参数了。可以期待下。
Best,
Jark
On Wed, 24 Mar 2021 at 19:29, Robin Zhang
wrote:
> Hi,guomuhua
> 开启本地聚合,是不需要自己打散进行二次聚合的哈,建议看看官方的文档介绍。
>
> Best,
> Robin
>
>
> guomuhua wrote
> > 在SQL中,如果开启了 local-global 参数:set
> > table.o
Hi,guomuhua
开启本地聚合,是不需要自己打散进行二次聚合的哈,建议看看官方的文档介绍。
Best,
Robin
guomuhua wrote
> 在SQL中,如果开启了 local-global 参数:set
> table.optimizer.agg-phase-strategy=TWO_PHASE;
> 或者开启了Partial-Final 参数:set table.optimizer.distinct-agg.split.enabled=true;
> set
> table.
在SQL中,如果开启了 local-global 参数:set table.optimizer.agg-phase-strategy=TWO_PHASE;
或者开启了Partial-Final 参数:set table.optimizer.distinct-agg.split.enabled=true;
set
table.optimizer.distinct-agg.split.bucket-num=1024;
还需要对应的将SQL改写为两段式吗?
例如:
原SQL:
SELECT day,
??flinksqlcount (distinct??state??
Hi, 夜思流年梦
我理解按照日期分组就可以解决你的需求,流数据属于哪一天就只算当天的,不影响其他date的数据;
按天分组的数据都计算出来了,再汇总一下就是一个月的
Best,
Robin
夜思流年梦 wrote
> 现有此场景:
> 计算每天员工的业绩(只计算当天的)
>
>
> 现在我用flink-sql 的方式,insert into select current_date, count(1) ,worker from
> XX where writeTime>=current_date group by worker;
> 把数据按天分区的方式先把数据
现有此场景:
计算每天员工的业绩(只计算当天的)
现在我用flink-sql 的方式,insert into select current_date, count(1) ,worker from XX
where writeTime>=current_date group by worker;
把数据按天分区的方式先把数据sink到mysql
但是发现落地到mysql的数据把前几天的数据都给算进来了,如何只算今天的数据?
另外还有一个疑惑,如何既计算当天数据,又包含了本月的所有数据?
?? 2020-10-19 08:03:46??"??" ??
>??user_id
>
>
>
>| |
>??
>|
>|
>??xiongyun...@163.com
>|
>
>??
>
>??2020??10??17?? 16:24??867127831 ?
-- --
??:
"user-zh"
??user_id
| |
??
|
|
??xiongyun...@163.com
|
??
??2020??10??17?? 16:24??867127831 ??
??flink sqldaugroupbycount distinct
user_id??table.optimizer.distinct-agg.
??flink sqldaugroupbycount distinct
user_id??table.optimizer.distinct-agg.split.enabled=true
job??mysql
?
015qdlg@
> >;
> 发送时间: 2020年9月29日(星期二) 下午5:32
> 收件人: "user-zh"<
> user-zh@.apache
> >;
>
> 主题: Re: flink sql count问题
>
>
>
> Hi lemon,
> 内部判断if函数可以替换为case when
>
> Best,
> Robin
>
>
> lemon wrote
> > 请教各位:
count0
flinkcount??ifwhere??
?? select count(if(name like '%',1 , null)) where name
like '%' or name
Hi lemon,
内部判断if函数可以替换为case when
Best,
Robin
lemon wrote
> 请教各位:
> 我有一个sql任务需要进行count,在count中有一个表达式,只想count符合条件的记录,
> 之前在hive中是这么写的:count(if(name like '南京%',1 , null)),但是flink
> sql中count不能为null,有什么别的方法能实现该功能吗?
> 使用的是flink1.10.1 blink
>
--
Sent from: http://apache-flink.147419.n8.nabb
??
-- --
??:
"user-zh"
gt;
>
>
> 在 2020-09-27 17:23:07,"zya" 写道:
> >你好,链接无法显示,能麻烦再贴下吗
> >
> >
> >-- 原始邮件 --
> >发件人:
> "user-zh"
>
>
>
>-- 原始邮件 --
>发件人:
> "user-zh"
>
>发送时间: 2020年9月27日(星期天) 下午5:20
>收件人: "user-zh"
>主题: Re:回复:flink sql count问题
>
>
>
??
-- --
??:
"user-zh"
???'))
?? 2020-09-27 17:07:39??"zya" ??
>
>
>
>
>
>-- --
>??:
>"user-zh"
>
>????: 2020??9??2
-- --
??:
"user-zh"
ser-zh"
>: Re:flink sql count
>
>
>
>??null 0?? ?? sum(if(name like '%',1 , 0))
>?? 2020-09-27 16:53:56??"zya" >??
>>sqlcountcount????????????count?
??sum??mysql??
-- --
??:
"user-zh"
??
sqlcountcountcount
??hive??count(if(name like '%',1 , null))??????flink
sql??count??null
flink1.10.1 blink
hi, 你看到的 select count(distinct a, b) from mytable 单元测试能通过,应该是只测试 logical
plan,当前在生成 physical plan的时候,显示的禁用了多个字段
Bests,
Godfrey
apache22 于2020年2月26日周三 下午6:56写道:
> 我的经验:
> count(distinct 只支持单字段) , distinct a,b 是可以的
> 有一个解决方式:count(distinct concat(a,b))
>
>
> | |
> apache22
> |
> |
> apach...
??
count(distinct ) , distinct a,b
count(distinct concat(a,b))
| |
apache22
|
|
apach...@163.com
|
??
??2020??2??26?? 18:21 ??
??:
flink??count
coun
Hi,
能贴一下你的完整 SQL 吗?
On Wed, 26 Feb 2020 at 18:21, 小旋锋 wrote:
> 大家好:
> 我在flink官方文档上看到内置聚合函数count的函数头是这样的
> count([all] Expression | distinct Expression1
> [, Expression2])
> 所以它应该可以对多个属性进行distinct去重,而且我在源码的单元测试里也看到有几个用例
> select count(distinct a, b) from mytable,并且是可以运行通过
??:
flink??count
count([all] Expression | distinct Expression1 [,
Expression2])
distinct??
select count(distinct a, b) from mytable
hi,
Thanks for the reply. I am using default FsStateBackend rather than rocksdb
with checkpoint off. So I really cannot see any state info from the dashboard.
I will research more details and see if any alternative can be optimized.
At 2020-01-08 19:07:08, "Benchao Li" wrote:
>hi sun
hi sunfulin,
As Kurt pointed out, if you use RocksDB state backend, maybe slow disk IO
bound your job.
You can check WindowOperator's latency metric to see how long it tasks to
process an element.
Hope this helps.
sunfulin 于2020年1月8日周三 下午4:04写道:
> Ah, I had checked resource usage and GC from fl
Ah, I had checked resource usage and GC from flink dashboard. Seem that the
reason is not cpu or memory issue. Task heap memory usage is less then 30%.
Could you kindly tell that how I can see more metrics to help target the
bottleneck?
Really appreciated that.
At 2020-01-08 15:59:17, "
hi,godfreyhe
As far as I can see, I rewrite the running sql from one count distinct level to
2 level agg, just as the table.optimizer.distinct-agg.split.enabled param
worked. Correct me if I am telling the wrong way. But the rewrite sql does not
work well for the performance throughout.
For n
Hi,
Could you try to find out what's the bottleneck of your current job? This
would leads to
different optimizations. Such as whether it's CPU bounded, or you have too
big local
state thus stuck by too many slow IOs.
Best,
Kurt
On Wed, Jan 8, 2020 at 3:53 PM 贺小令 wrote:
> hi sunfulin,
> you ca
hi sunfulin,
you can try with blink planner (since 1.9 +), which optimizes distinct
aggregation. you can also try to enable
*table.optimizer.distinct-agg.split.enabled* if the data is skew.
best,
godfreyhe
sunfulin 于2020年1月8日周三 下午3:39写道:
> Hi, community,
> I'm using Apache Flink SQL to build so
Hi, community,
I'm using Apache Flink SQL to build some of my realtime streaming apps. With
one scenario I'm trying to count(distinct deviceID) over about 100GB data set
in realtime, and aggregate results with sink to ElasticSearch index. I met a
severe performance issue when running my flink jo
37 matches
Mail list logo