Hi Naehee,
Thanks for reporting the issue. Yes, it is a bug in the ParquetInputFormat.
Would you please create a jira ticket and assign to me. I will try to fix
it by the end of this weekend.
My Jira account name Zhenqiu Huang. Thanks
Best Regards
Peter Huang
On Wed, Nov 4, 2020 at 11:57 PM
不能在 Submit 模式指定的原因是 Flink 提交模式和 Spark 不同。Flink 是在客户端需要编译
JobGgraph,这样导致在客户端运行时候就需要你所加载的 Jar,
而 Submit 模式的话,只是说把 Jar 包加载到运行的 Classpath 下;但是 Spark 程序的编译是在 服务端做的。
PS:目前 Flink-1.11 的话支持 Application 模式,有点类似 Spark 的提交模式,在这个模式下可以尝试看下是否 可以在 Submit
模式下指定。
|
|
|
|
|
在2020年11月06日 11:12,silence 写道:
Hi bradyMk,
Bulk-encoded Formats 只能在 Checkpoint 时滚动,详见文档一[1].
Best,
Hailong Wang
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/streamfile_sink.html#bulk-encoded-formats
在 2020-11-06 10:47:33,"bradyMk" 写道:
>Hi,guoliang_wang1335
Hi si_tianqiang,
自定义 UDF 可以解决你的问题吗?
比如 接收 kakfa 的数据字段定义成 hbaseQuery,然后自定义 UDF 去根据 query 查询数据。
Best,
Hailong Wang
在 2020-11-06 10:41:53,"site" 写道:
>看了官网的示例,发现sql中传入的值都是固定的,我有一个场景是从kafka消息队列接收查询条件,然后通过flink-sql映射hbase表进行查询并写入结果表。我使用了将消息队列映射表再join数据表的方式,回想一下这种方式很不妥,有什么好的方法实现sql入参的动态查询呢?
Ok, thank you.
From: Chesnay Schepler
Sent: Thursday, November 5, 2020 3:15:28 PM
To: Alexey Trenikhun ; Flink User Mail List
Subject: Re: union stream vs multiple operators
I don't think the first option has any benefit.
On 11/5/2020 1:19 AM, Alexey
Hello,
I have a Job that's a series of Joins, GroupBys, and Aggs and it's
bottlenecked in one of the joins. The join's cardinality is ~300 million
rows on the left and ~200 million rows on the right all with unique keys.
I'm seeing this in the plan for that bottlenecked Join.
Great, thanks!
So just to confirm, configure # of task slots to # of core nodes x # of
vCPUs?
I'm not sure what you mean by "distribute them across both jobs (so that
the total adds up to 32)". Is it configurable how many task slots a job can
receive, so in this case I'd provide ~30/36 * 32 task
Hi Rex,
as a rule of thumb I recommend configuring your TMs with as many slots as
they have cores. So in your case your cluster would have 32 slots. Then
depending on the workload of your jobs you should distribute them across
both jobs (so that the total adds up to 32). A high number of
Hi Simone,
The problem is that the Java 1.8 compiler cannot do type inference when
chaining methods [1].
The solution would be
WatermarkStrategy wmStrategy =
WatermarkStrategy
.forMonotonousTimestamps()
Hi,
The Dockerfiles in the examples in the flink-statefun repo currently work
against images built from snapshot development branches.
Ververica has been hosting StateFun base images for released versions:
https://hub.docker.com/r/ververica/flink-statefun
You can change `FROM flink-statefun:*`
Hi,
I tried to build statefun-greeter-example docker image with "docker build
." but cannot pull the base statefun docker image due to access denied. Any
idea? Thanks.
$ docker login
Authenticating with existing credentials...
Login Succeeded
Hi,
I'm taking the timestamp from the event payload that I'm receiving from Kafka.
I'm struggling to get the time and I'm confused on how I should use the
function ".withTimestampAssigner()". I'm receiving an error on event.getTime()
that is telling me: "cannot resolve method "Get Time" in
Actually, in our document, we have provided a command[1] to create the
service account.
It is similar to your yaml file.
$ kubectl create clusterrolebinding flink-role-binding-default
--clusterrole=edit --serviceaccount=default:default
Unfortunately, we could not support mounting a PVC. We plan
失败的根本原因应该不是ConfigMap找不到,warning的那个信息是因为创建JobManager deployment的时候
ConfigMap还没创建出来,不会导致失败的。
你可以参考这个地方[1]把JobManager的log的打到console里面,然后用kubectl logs 来查看,这样
可以排查JobManager一直crash backoff的原因
[1].
发件人: Shawn Huang
发送时间: 2020-11-06 16:56
收件人: user-zh
主题: Re: 关于cluster.evenly-spread-out-slots参数的底层原理
我说一下我看源码(1.11.2)之后的理解吧,不一定准确,仅供参考。
cluster.evenly-spread-out-slots 这个参数设置后会作用在两个地方:
1. JobMaster 的 Scheduler 组件
2. ResourceManager 的 SlotManager 组件
对于 JobMaster 中的 Scheduler,
它在给
I think it's ok.. I suggest also to add JobStatus to onJobExecuted() so you
can immediately know if the job finished successfully or if it is was
failed or canceled.
Thanks for the help,
Flavio
On Fri, Nov 6, 2020 at 10:41 AM Kostas Kloudas wrote:
> Hi Flavio,
>
> Coould this
Hi Flavio,
Coould this https://issues.apache.org/jira/browse/FLINK-20020 help?
Cheers,
Kostas
On Thu, Nov 5, 2020 at 9:39 PM Flavio Pompermaier wrote:
>
> Hi everybody,
> I was trying to use the JobListener in my job but onJobExecuted() on Flink
> 1.11.0 but I can't understand if the job
关于cdc有个问题,求大佬能否解释下,是解析bin log的bug还是自己代码bug;
mysql数据库中表,创建时间和修改时间设置为current_timestamp
场景一:插入数据
插入数据时忽略创建时间和修改时间字段
cdc接入后,转存到hbase中,转为字符串时间,时间少8个小时
确认了,程序运行的服务器时间、mysql服务器的时间,和hbase服务器的时间,均为UTC+0800时区
场景二:重启服务,重新读取数据
此时cdc接入数据,会将最后的数据拿出来写入hbase,此时按照同样的执行,数据库时间也是放的正确时间,hbase时间也能吻合
Hi, Satyam ~
What version of Flink release did you use? I tested your first SQL
statements in local and they both works great.
Your second SQL statement fails because currently we does not support
stream-stream join on time attributes because the join would breaks the
semantic of time attribute
Hi Chesney,
Thanks for the hint.
I have mounted my certs in both job and taskmanager volume mounts.
When the containers bootup I get the log that the ssl store is successfully
loaded.
Note: I use the same keystore setup to connect to secured Kafka Cluster and
this works.
How would you suggest
我说一下我看源码(1.11.2)之后的理解吧,不一定准确,仅供参考。
cluster.evenly-spread-out-slots 这个参数设置后会作用在两个地方:
1. JobMaster 的 Scheduler 组件
2. ResourceManager 的 SlotManager 组件
对于 JobMaster 中的 Scheduler,
它在给 execution vertex 分配 slot 是按拓扑排序的顺序依次进行的。
Scheduler 策略是会倾向于把 execution vertex 分配到它的上游节点所分配到的slot上,
因此在给某个具体 execution
Hi,请教下各位:
我的场景是现在有个Keyby操作,但是我需要指定某一个key落地在某一个具体物理分区中。
我注意到keyby中得KeySelector仅仅是逻辑的分区,其实还是通过hash的方式来物理分区,没有办法指定哪一个key到哪一个分区去做。
我尝试使用partitionCustom中带有partitioner和keySelector的参数函数,但是发现没有办法直接使用类似Sum一类的聚合函数,实际测试发现Sum会将同一物理分区、但是不同Key的值都累加起来。
22 matches
Mail list logo