Hi,
I am intending to use the File source for a production use case. I have a few
use cases that are currently not supported like deleting a file once it's
processed.
So I was wondering if we can use this in production or write my own
implementation? Is there any recommendations around this?
In my opinion looks similar. Were you able to tune-up Flink to make it work?
I'm stuck with it, I wanted to scale up hoping to reduce backpressure, but to
rescale I need to take savepoint, which never completes (at least takes longer
than 3 hours).
From:
Hi,
We have upgraded an application originally written for Flink 1.9.1 with
Scala 2.11 to Flink 1.11.2 with Scala 2.12.7 and we are seeing the following
error at runtime.
2021-03-16 20:37:08
java.lang.RuntimeException
at
Hi, Xingbo
想跟您了解一下关于sql_query执行上的细节,flink1.12版本底层执行sql语句的过程中,是否有谓词下退的优化?
从相关的代码测试结果看:
1. pyflink1.11版本的connector定义支持参数read.query来获取数据,执行效率很高,猜测这部分执行交由数据库完成;
2. pyflink1.12版本取消了read.query参数,当定义多个数据源执行join等操作时,耗时很明显(pyflink)
请教如下问题:
1. TableEnvironment 的 Checkpoint 以及 RestartStrategy 如何设置? 是仅有通过设置
StreamTableEnvironment ,再 create TableEnv 的方式吗?
2. 使用catalog 时, 对于已经存在的物理表,如何指定 表字段为EventTime 以及Watermark
如何设置。还需要CreateTable 吗?
songzhongs...@kedacom.com
Hi Alexey,
I believe your exception messages are printed from Flink-1.12.2 not
Flink-1.12.1 due to the line number of method calling.
Could you share exception message of Flink-1.12.1 when rescaling? Moreover, I
hope you could share more logs during restoring and rescaling. I want to see
Hi Rex,
Prefix seek iterator has not ever been used in Flink when seeking. I hope you
could first read more details about this from RocksDB wiki as prefix extractor
could impact the performance.
Best
Yun Tang
From: Rex Fenley
Sent: Wednesday, March 17, 2021
你好,谢谢你的回复,现在更新到V1.12就可以直接运行了
--
Sent from: http://apache-flink.147419.n8.nabble.com/
Neither those are metrics metrics on a ValueState, which is
updated at least once every call to process. The metric is the the number
of these ValueStates scoped to a key ( am using session
windows ).
On Mon, Mar 15, 2021 at 11:29 PM Yun Tang wrote:
> Hi,
>
> Could you describe what you
Hello,
I'm wondering how, in the event of a poison pill record on Kafka, to
advance a partition's checkpointed offsets by 1 when using the TableAPI/SQL.
It is my understanding that when checkpointing is enabled Flink uses its
own checkpoint committed offsets and not the offsets committed to
Thanks for the input, I'll look more into that.
Does your answer then imply that Joins and Aggs do not inherently always
use prefix seeks? I'd imagine that the join key on join and groupby key on
aggs would always be used as prefix keys. Is this not the case?
Also, is there good information on
I've made a handful of tweaks to it to try and get them to pick up as
expected (i.e. adding logging to every available overload for the
interceptors, etc) using something similar to the following:
fun create(): InterceptingTaskMetricGroup {
val operatorGroup = object:
Hi all,
I'd like to measure how many events arrive within allowed lateness grouped
by particular feature of the event. We assume particular type of events
have way more late arrivals and would like to verify this. The natural
place to make the measurement would be our custom trigger within
Actually you'd have to further subclass the operatorMetricGroup such
that addGroup works as expected.
This is admittedly a bit of a drag :/
On 3/16/2021 4:35 PM, Chesnay Schepler wrote:
The test harness is fully independent of the MiniClusterResource; it
isn't actually running a job. That's
Hi Timo/Team,
Thanks for the reply.
Just take the example from the following pseduo code,
Suppose , this is the current application logic.
firstInputStream = addSource(...)* //Kafka consumer C1*
secondInputStream = addSource(...) *//Kafka consumer C2*
outputStream = firstInputStream,keyBy(a ->
Thanks All,
I've added python and pyflink to the TM image which fixed the problem. Now
however submitting a python script to the cluster successfully is sporadic;
sometimes it completes but most of the time it just hangs. Not sure what
is causing this.
On Mon, Mar 15, 2021 at 9:47 PM Xingbo
The test harness is fully independent of the MiniClusterResource; it
isn't actually running a job. That's why your metrics never arrive at
the reporter.
You can either:
a) use the test harness with a custom MetricGroup implementation that
intercepts registered metrics, set in the
In this case, I was using a harness to test the function. Although, I could
honestly care less about the unit-test surrounding metrics, I'm much more
concerned with having something that will actually run and work as intended
within a job. The only real concern I have or problem that I want to
Are you actually running a job, or are you using a harness for testing
your function?
On 3/16/2021 3:24 PM, Rion Williams wrote:
Hi Chesnay,
Thanks for the prompt response and feedback, it's very much
appreciated. Please see the inline responses below to your questions:
*Was there
Hi Sebastian,
you can checkout the logic your self by looking into
https://github.com/apache/flink/blob/master/flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/debezium/DebeziumJsonDeserializationSchema.java
and
Hi Chesnay,
Thanks for the prompt response and feedback, it's very much appreciated.
Please see the inline responses below to your questions:
*Was there anything in the logs (ideally on debug)?*
I didn't see anything within the logs that seemed to indicate anything out
of the ordinary. I'm
On Tue, Mar 16, 2021 at 02:32:54AM +, Alexey Trenikhun wrote:
> Hi Roman,
> I took thread dump:
> "Source: digital-itx-eastus2 -> Filter (6/6)#0" Id=200 BLOCKED on
> java.lang.Object@5366a0e2 owned by "Legacy Source Thread - Source:
> digital-itx-eastus2 -> Filter (6/6)#0" Id=202
> at
>
Hi Timo,
The measurement of influxdb is like the table of mysql, "from table1" is a very
strange table name.
Maybe this is a bug.
Regards,
Tim
--
- Original Message -
From: "Timo Walther"
To: user@flink.apache.org
Sent: Tue, 16 Mar 2021 13:30:41 +0100
Subject: Re: Find many
Hi Satyam,
first of all your initial join query can also work, you just need to
make sure that no time attribute is in the SELECT clause. As the
exception indicates, you need to cast all time attributes to TIMESTAMP.
The reason for this is some major design issue that is also explained
here
Hi Tim,
"from table1" might be the operator that reads "table1" also known as
the table scan operator. Could you share more of the metrics and their
values? Most of them should be explained in
https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/metrics/#system-metrics
Regards,
Hi Jessy,
to be precise, the JobGraph is not used at runtime. It is translated
into an ExecutionGraph.
But nevertheless such patterns are possible but require a bit of manual
implementation.
Option 1) You stop the job with a savepoint and restart the application
with slightly different
Hi Team,
Is it possible to edit the job graph at runtime ? . Suppose, I want to add
a new sink to the flink application at runtime that depends upon the
specific parameters in the incoming events.Can i edit the jobgraph of a
running flink application ?
Thanks
Jessy
Hi,
你可以理解为用的是MapState来保存的状态。
op <520075...@qq.com> 于2021年3月16日周二 下午3:00写道:
> 各位大佬好,想问下flinksql里的count (distinct)默认是用哪种state保存的状态
--
Best,
Benchao Li
Hi,
your explanation makes sense but I'm wondering how the implementation
would look like. This would mean bigger changes in a Flink fork, right?
Late data handling in SQL is a frequently asked question. Currently, we
don't have a good way of supporting it. Usually, we recommend to use
Hi everyone!
We're constantly working to improve the Flink community experience and need
your help! Please take 2 min to share your thoughts with us via a short
survey [1].
Your feedback will help us understand where we stand on communication
between community members, what activities you prefer
Was there anything in the logs (ideally on debug)?
Have you debugged the execution and followed the counter() calls all the
way to the reporter?
Do you only see JobManager metrics, or is there somewhere also something
about the TaskManager?
I can see several issues with your code, but none
你好:
我这边根据数据字典 动态生产类然后通过map函数对我的json字符串映射到我动态生成的类中;
public static Class getClazz(String className,String cls) throws
Exception {
SimpleCompiler compiler = new SimpleCompiler();
compiler.cook(cls);
Hi Yang,
Thanks for the reply. Looking forward to 1.13 :)
Best wishes,
Chen-Che
On 2021/03/16 07:41:18, Yang Wang wrote:
> I think the pod template[1] is what you are looking for. It will be
> released in 1.13.
>
> [1].
>
这个问题我解决了,这样定义应该是可以
On 03/16/2021 15:11, superainbower wrote:
请教下大家,官网中对于时态表的定义的案例是基于debezium的,我现在具基于canal这样定义有问题吗?定义如下
create table produce(
id string,
name string,
price decimal(10,4)
update_time timestamp(3) metadata from ‘timestamp’ virtual,
primary key(id) not enforced,
watermark for
I think the pod template[1] is what you are looking for. It will be
released in 1.13.
[1].
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template
Best,
Yang
Chen-Che Huang 于2021年3月16日周二 下午1:26写道:
> Hi,
>
> We use the per-job
咋么有人啊~~~
--
Sent from: http://apache-flink.147419.n8.nabble.com/
各位大佬,在做远程提交任务到flink的standalone模式,抛以下异常
Caused by: java.lang.RuntimeException: java.lang.InterruptedException
at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:277)
at
org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:981)
at
请教下大家,官网中对于时态表的定义的案例是基于debezium的,我现在具基于canal这样定义有问题吗?定义如下
create table produce(
id string,
name string,
price decimal(10,4)
update_time timestamp(3) metadata from ‘timestamp’ virtual,
primary key(id) not enforced,
watermark for update_time as update_time
)with(
‘connector’=‘Kafka’,
Also restore from same savepoint without change in parallelism works fine.
From: Alexey Trenikhun
Sent: Monday, March 15, 2021 9:51 PM
To: Yun Tang ; Tzu-Li (Gordon) Tai ;
user@flink.apache.org
Subject: Re: EOFException on attempt to scale up job with RocksDB
??flinksqlcount (distinct??state??
Thanks Julian, that worked! I totally missed this in the documentation.
On Mon, Mar 15, 2021 at 4:06 PM Jaffe, Julian
wrote:
> You can use `env.java.opts.taskmanager` to specify java options for the
> task managers specifically. Be aware you may want to set `suspend=n` or be
> sure to attach
Hi,
补充回答两点
1. 现在Table上是支持sliding window和Tumpling Window的Pandas UDAF[1]的,
在1.13会支持session
window的UDAF的支持。对于datastream上window的支持,对于上述几种window,你可以转到table上去操作,对于自定义window,datastream会在1.13支持。
2. 关于性能问题,如果你不使用Python
UDFs的话,本质就是跑的Java的代码,python起的作用只是在客户端编译JobGraph的作用,所以不存在说Python
Hi Yik San,
Thanks for the investigation of PyFlink together with all these ML libs.
IMO, you could refer to the flink-ai-extended project that supports the
Tensorflow on Flink, PyTorch on Flink etc, whose repository url is
https://github.com/alibaba/flink-ai-extended. Flink AI Extended is a
43 matches
Mail list logo