【疑问】RocksDBStateBackend为什么使用单独封装的frocksdbjni,而不用RocksDB官方提供的RocksJava

2021-03-23 Thread zoltar9264
大家好, 在RocksDBStateBackend的pom中看到是使用了 frocksdbjni,看了下这个包是dataArtisans自己的。而RocksDBStateBackend是有提供Java sdk的,叫RocksJava。RocksDBStateBackend为什么不直接用 RocksJava呢? | | Feifan Wang | | zoltar9...@163.com | 签名由网易邮箱大师定制

Re: Kubernetes Application Cluster Not Working

2021-03-23 Thread Yang Wang
Are you sure that the JobManager akka address is binded to "flink-jobmanager"? You could set "jobmanager.rpc.address" to flink-jobmanager in the ConfigMap. Best, Yang Guowei Ma 于2021年3月24日周三 上午10:22写道: > Hi, M > Could you give the full stack? This might not be the root cause. > Best, > Guowei

Fail to cancel perJob for that deregisterApplication is not called

2021-03-23 Thread 刘建刚
I am using flink 1.10.0. My perJob can not be cancelled. From the log I find that webMonitorEndpoint.closeAsync() is completed but deregisterApplication is not called. The related code is as follows: public CompletableFuture deregisterApplicationAndClose( final ApplicationStatus

相同的作业配置 ,Flink1.12 版本的作业checkpoint耗时增加以及制作失败,Flink1.9的作业运行正常

2021-03-23 Thread Haihang Jing
【现象】相同配置的作业(checkpoint interval :3分钟,作业逻辑:regular join),flink1.9运行正常,flink1.12运行一段时间后,checkpoint制作耗时增大,最后checkpoint制作失败。 【分析】了解到flink1.10后对于checkpoint机制进行调整,接收端在barrier对齐时不会缓存单个barrier到达后的数据,意味着发送方必须在barrier对齐后等待credit feedback来传输数据,因此发送方会产生一定的冷启动,影响到延迟和网络吞吐量,因此调整checkpoint

Re: Failed to unit test PyFlink UDF

2021-03-23 Thread Yik San Chan
Hi Dian, Thanks for your patience on all these asks! Best, Yik San On Wed, Mar 24, 2021 at 10:32 AM Dian Fu wrote: > It’s a good advice. I have created ticket > https://issues.apache.org/jira/browse/FLINK-21938 to track this. > > 2021年3月24日 上午10:24,Yik San Chan 写道: > > Hi Dian, > > As you

Re: Flink 消费kafka ,写ORC文件

2021-03-23 Thread Robin Zhang
Hi,Jacob 官网有这么一段:`我们可以在格式构建器上调用 .withBucketAssigner(assigner) 来自定义 BucketAssigner ` 链接: https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/dev/connectors/file_sink.html#%E6%A1%B6%E5%88%86%E9%85%8D

Re: Failed to unit test PyFlink UDF

2021-03-23 Thread Dian Fu
It’s a good advice. I have created ticket https://issues.apache.org/jira/browse/FLINK-21938 to track this. > 2021年3月24日 上午10:24,Yik San Chan 写道: > > Hi Dian, > > As you said, users can, but I got the impression that using ._func to access

Re: Pyflink tutorial output

2021-03-23 Thread Dian Fu
How did you check the output when submitting to the kubernetes session cluster? I ask this because the output should be written to the local directory “/tmp/output” on the TaskManagers where the jobs are running on. Regards, Dian > 2021年3月24日 上午2:40,Robert Cullen 写道: > > I’m running this

Re: Failed to unit test PyFlink UDF

2021-03-23 Thread Yik San Chan
Hi Dian, As you said, users can, but I got the impression that using ._func to access the original Python function is not recommended, therefore not documented. While in Flink, unit testing a Scala/Java UDF is clearly documented and encouraged. Do I misread something? Best, Yik San On Wed, Mar

Re: Pyflink tutorial output

2021-03-23 Thread Shuiqiang Chen
Hi Robert, Have you tried exploring the /tmp/output directory in the task manager pods on you kubernetes cluster? The StreamingFileSink will create the output directory on the host of task manager in which the sink tasks are executed. Best, Shuiqiang Robert Cullen 于2021年3月24日周三 上午2:48写道: >

Re: Kubernetes Application Cluster Not Working

2021-03-23 Thread Guowei Ma
Hi, M Could you give the full stack? This might not be the root cause. Best, Guowei On Wed, Mar 24, 2021 at 2:46 AM Claude M wrote: > Hello, > > I'm trying to setup Flink in Kubernetes using the Application Mode as > described here: >

Re: Failed to unit test PyFlink UDF

2021-03-23 Thread Dian Fu
As I replied in previous email, it doesn’t block users to write tests for PyFlink UDFs. Users could use ._func to access the original Python function if they want. Regards, Dian > 2021年3月23日 下午2:39,Yik San Chan 写道: > > Hi Dian, > > However users do want to unit test their UDFs, as supported

flink sql 写hive并行度设置问题

2021-03-23 Thread ggc
Hi, 请问: env.setParallelism(8); source = select * from table1, Table filterTable = source.filter(x-> x>10).limit(1); try (CloseableIterator rows = filterTable.execute().collect()) { while (rows.hasNext()) { Row r = rows.next(); String a = r.getField(1).toString();

flink sql 并行度问题

2021-03-23 Thread ggc
Hi, 请问: env.setParallelism(8); source = select * from table1, Table filterTable = source.filter(x-> x>10); try (CloseableIterator rows = filterTable.execute().collect()) { while (rows.hasNext()) { Row r = rows.next(); String a = r.getField(1).toString();

flink sql count distonct 优化

2021-03-23 Thread guomuhua
在SQL中,如果开启了 local-global 参数:set table.optimizer.agg-phase-strategy=TWO_PHASE; 或者开启了Partial-Final 参数:set table.optimizer.distinct-agg.split.enabled=true; set table.optimizer.distinct-agg.split.bucket-num=1024; 还需要对应的将SQL改写为两段式吗? 例如: 原SQL: SELECT day,

Re: DataDog and Flink

2021-03-23 Thread Vishal Santoshi
That said, is there a way to get a dump of all metrics exposed by TM. I was searching for it and I bet we could get it for ServieMonitor on k8s ( scrape ) but am missing a way to het a TM and dump all metrics that are pushed. Thanks and regards. On Tue, Mar 23, 2021 at 5:56 PM Vishal Santoshi

Re: DataDog and Flink

2021-03-23 Thread Vishal Santoshi
I guess there is a bigger issue here. We dropped the property to 500. We also realized that this failure happened on a TM that had one specific job running on it. What was good ( but surprising ) that the exception was the more protocol specific 413 ( as in the chunk is greater then some size

Kubernetes Application Cluster Not Working

2021-03-23 Thread Claude M
Hello, I'm trying to setup Flink in Kubernetes using the Application Mode as described here: https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes The doc mentions that there needs to be a aervice exposing the JobManager’s REST and UI

Pyflink tutorial output

2021-03-23 Thread Robert Cullen
I’m running this script taken from the Flink website: tutorial.py python tutorial.py from pyflink.common.serialization import SimpleStringEncoder from pyflink.common.typeinfo import Types from pyflink.datastream import StreamExecutionEnvironment from pyflink.datastream.connectors import

Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-23 Thread vishalovercome
Suppose i have a job with 3 operators with the following job graph: O1 => O2 // data stream partitioned by keyBy O1 => O3 // data stream partitioned by keyBy O2 => O3 // data stream partitioned by keyBy If operator O3 receives inputs from two operators and both inputs have the same type and

Re: Flink Streaming Counter

2021-03-23 Thread Vijayendra Yadav
Hi Pohl, Thanks for getting back to me so quickly. I am looking for a sample example where I can increment counters on each stage #1 thru #3 for DATASTREAM. Then probably I can print it using slf4j. Thanks, Vijay On Tue, Mar 23, 2021 at 6:35 AM Matthias Pohl wrote: > Hi Vijayendra, > thanks

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-23 Thread Matthias Pohl
Hi Vishal, I'm not 100% sure what you're trying to do. But the partitioning by a key just relies on the key on the used parallelism. So, I guess, what you propose should work. You would have to rely on some join function, though, when merging two input operators into one again. I hope that was

Re: How to get operator uid from a sql

2021-03-23 Thread Matthias Pohl
Hi XU Qinghui, sorry for the late reply. Unfortunately, the operator ID does not mean to be accessible for Flink SQL through the API. You might have a chance to extract the Operator ID through the debug logs. StreamGraphHasherV2.generateDeterministicHash logs out the operator ID [1]: "[main] DEBUG

Re: DataDog and Flink

2021-03-23 Thread Vishal Santoshi
If we look at this code , the metrics are divided into chunks up-to a max size. and enqueued

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-23 Thread Dawid Wysakowicz
Hey, I would like to double check this with Jark and/or Timo. As far as DataStream is concerned the javadoc is correct. Moreover the pipeline.auto-watermak-interval and setAutoWatermarkInterval are effectively the same setting/option. However I am not sure if Table API interprets it in the same

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-23 Thread Matthias Pohl
Hi Aeden, sorry for the late reply. I looked through the code and verified that the JavaDoc is correct. Setting pipeline.auto-watermark-interval to 0 will disable the automatic watermark generation. I created FLINK-21931 [1] to cover this. Thanks, Matthias [1]

Re: QueryableStateClient getKVState

2021-03-23 Thread Matthias Pohl
Could you provide the full stacktrace of your error? That might help me to dig into the code. Matthias On Tue, Mar 23, 2021 at 2:33 PM Sandeep khanzode wrote: > Hi Matthias, > > Thanks. But yes, I am comparing map with that.map … the comment is > probably for the previous variable name. > > I

Re: Flink Streaming Counter

2021-03-23 Thread Matthias Pohl
Hi Vijayendra, thanks for reaching out to the Flink community. What do you mean by displaying it in your local IDE? Would it be ok to log the information out onto stdout? You might want to have a look at the docs about setting up a slf4j metrics report [1] if that's the case. Best, Matthias [1]

Re: QueryableStateClient getKVState

2021-03-23 Thread Sandeep khanzode
Hi Matthias, Thanks. But yes, I am comparing map with that.map … the comment is probably for the previous variable name. I can use String, Int, Enum, Long type keys in the Key that I send in the Query getKvState … but the moment I introduce a TreeMap, even though it contains a simple one

Re: QueryableStateClient getKVState

2021-03-23 Thread Matthias Pohl
Hi Sandeep, the equals method does not compare the this.map with that.map but that.dimensions. ...at least in your commented out code. Might this be the problem? Best, Matthias On Tue, Mar 23, 2021 at 5:28 AM Sandeep khanzode wrote: > Hi, > > I have a stream that exposes the state for

Re: Evenly distribute task slots across task-manager

2021-03-23 Thread Matthias Pohl
There was a similar discussion recently in this mailing list about distributing the work onto different TaskManagers. One finding Xintong shared there [1] was that the parameter cluster.evenly-spread-out-slots is used to evenly allocate slots among TaskManagers but not how the tasks are actually

Re: Evenly distribute task slots across task-manager

2021-03-23 Thread Matthias Pohl
Hi Vignesh, are you trying to achieve an even distribution of tasks for this one operator that has the parallelism set to 16? Or do you observe the described behavior also on a job level? I'm adding Chesnay to the thread as he might have more insights on this topic. Best, Matthias On Mon, Mar

Re: Checkpoint fail due to timeout

2021-03-23 Thread Piotr Nowojski
Hi Alexey, You should definitely investigate why the job is stuck. 1. First of all, is it completely stuck, or is something moving? - Use Flink metrics [1] (number bytes/records processed), and go through all of the operators/tasks to check this. 2. The stack traces like the one you quoted: >

Re: Flink 1.12.0 隔几个小时Checkpoint就会失败

2021-03-23 Thread Haihang Jing
你好,问题定位到了吗? 我也遇到了相同的问题,感觉和checkpoint interval有关 我有两个相同的作业(checkpoint interval 设置的是3分钟),一个运行在flink1.9,一个运行在flink1.12,1.9的作业稳定运行,1.12的运行5小时就会checkpoint 制作失败,抛异常 org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable failure threshold. 当我把checkpoint

[DISCUSS] Feature freeze date for 1.13

2021-03-23 Thread Dawid Wysakowicz
Hi devs, users! 1. *Feature freeze date* We are approaching the end of March which we agreed would be the time for a Feature Freeze. From the knowledge I've gather so far it still seems to be a viable plan. I think it is a good time to agree on a particular date, when it should happen. We

With the checkpoint interval of the same size, the Flink 1.12 version of the job checkpoint time-consuming increase and production failure, the Flink1.9 job is running normally

2021-03-23 Thread Haihang Jing
【Appearance】For jobs with the same configuration (checkpoint interval: 3 minutes, job logic: regular join), flink1.9 runs normally. After flink1.12 runs for a period of time, the checkpoint creation time increases, and finally the checkpoint creation fails. 【Analysis】After learning flink1.10, the

Re: RocksDB StateBuilder unexpected exception

2021-03-23 Thread dhanesh arole
Hi Matthias, Thanks for taking to help us with this. You are right there were lots of task cancellations, as this exception causes the job to get restarted, triggering cancellations. - Dhanesh Arole On Tue, Mar 23, 2021 at 9:27 AM Matthias Pohl wrote: > Hi Danesh, > thanks for reaching out

Re: Flink on Minikube

2021-03-23 Thread Arvid Heise
Hi Sandeep, please have a look at [1], you should add most Flink dependencies as provided - exceptions are connectors (or in general stuff that is not in flink/lib/ or flink/plugins). [1]

Re: RocksDB StateBuilder unexpected exception

2021-03-23 Thread Matthias Pohl
Hi Danesh, thanks for reaching out to the Flink community. Checking the code, it looks like the OutputStream is added to a CloseableRegistry before writing to it [1]. My suspicion is - based on the exception cause - that the CloseableRegistry got triggered while restoring the state. I tried to

Re: Re: Re: [DISCUSSION] Introduce a separated memory pool for the TM merge shuffle

2021-03-23 Thread Guowei Ma
Hi, I discussed with Xingtong and Yingjie offline and we agreed that the name `taskmanager.memory.framework.off-heap.batch-shuffle.size` can better reflect the current memory usage. So we decided to use the name Till suggested. Thank you all for your valuable feedback. Best, Guowei On Mon, Mar

Re: flink-1.11.2版本,客户端如何设置dynamic properties

2021-03-23 Thread easonliu30624700
看FlinkYarnSessionCli代码: final Configuration configuration = applyCommandLineOptionsToConfiguration(cmd); final ClusterClientFactory yarnClusterClientFactory = clusterClientServiceLoader.getClusterClientFactory(configuration);

Re: OrcTableSource in flink 1.12

2021-03-23 Thread Timo Walther
Hi Nikola, for the ORC source it is fine to use `TableEnvironment#fromTableSource`. It is true that this method is deprecated, but as I said not all connectors have been ported to be supported in the SQL DDL via string properties. Therefore, `TableEnvironment#fromTableSource` is still

Re: Editing job graph at runtime

2021-03-23 Thread Arvid Heise
Hi, Option 2 is to implement your own Source/Sink. Currently, we have the old, discouraged interfaces along with the new interfaces. For source, you want to read [1]. There is a KafkaSource already in 1.12 that we consider Beta, you can replace it with the 1.13 after the 1.13 release (should be

Re: Checkpoint fail due to timeout

2021-03-23 Thread Roman Khachatryan
Unfortunately, the lock can't be changed as it's part of the public API (though it will be eliminated with the new source API in FLIP-27). Theoretically, the change you've made should improve checkpointing at the cost of throughput. Is it what you see? But the new stack traces seem strange to me

Re: Editing job graph at runtime

2021-03-23 Thread Jessy Ping
Hi Arvid, Thanks for the reply. I am currently exploring the flink features and we have certain use cases where new producers will be added the system dynamically and we don't want to restart the application frequently. It will be helpful if you explain the option 2 in detail ? Thanks &

Re: Failed to unit test PyFlink UDF

2021-03-23 Thread Yik San Chan
Hi Dian, However users do want to unit test their UDFs, as supported in https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html#testing-user-defined-functions Even though the examples are for Flink, I believe PyFlink should ideally be no difference. What do you think?

Re: Failed to unit test PyFlink UDF

2021-03-23 Thread Dian Fu
Hi Yik San, This field isn't expected to be exposed to users and so I'm not convinced that we should add such an interface/method in Flink. Regards, Dian On Tue, Mar 23, 2021 at 2:04 PM Yik San Chan wrote: > Hi Dian, > > The ._func method seems to be internal only. Maybe we can add some >

Re: OOM issues with Python Objects

2021-03-23 Thread Dian Fu
Hi Kevin, Is it possible to provide a simple example to reproduce this issue? PS: It will use pickle to perform the serialization/deserialization if you don't specify the type info. Regards, Dian On Mon, Mar 22, 2021 at 10:55 PM Arvid Heise wrote: > Hi Kevin, > > yes I understood that, but

Re: Failed to unit test PyFlink UDF

2021-03-23 Thread Yik San Chan
Hi Dian, The ._func method seems to be internal only. Maybe we can add some public-facing method to make it more intuitive for use in unit test? What do you think? Best, Yik San On Tue, Mar 23, 2021 at 2:02 PM Yik San Chan wrote: > Hi Dian, > > Thanks! It solves my problem. > > Best, > Yik

Re: Failed to unit test PyFlink UDF

2021-03-23 Thread Yik San Chan
Hi Dian, Thanks! It solves my problem. Best, Yik San On Tue, Mar 23, 2021 at 1:29 PM Dian Fu wrote: > H Yik San, > > As the udf `add` is decorated with `@udf` decorator, it is no longer a > simple Python function if you reference `add`. If you execute > `print(type(add(1, 1)))`, you will see