Re: Dynamically allocating right-sized task resources

2019-08-05 Thread Yang Wang
Hi Chad, Just as Xintong said, fine grained resource management has not been introduced to flink. And i think it is the elegant solution for your scenario. Task managers with different resource specification will be allocated and started by Yarn/k8s resource manager according to your operator reso

Re: An ArrayIndexOutOfBoundsException after a few message with Flink 1.8.1

2019-08-05 Thread Yun Gao
Hi Nicolas: Are you using a custom partitioner? If so, you might need to check if the Partitioners#partition has returned a value that is greater than or equal to the parallelism of the downstream tasks. The expected return value should be in the interval [0, the parallelism of the downstr

Re: Memory constrains running Flink on Kubernetes

2019-08-05 Thread Yun Tang
You are correct, the default value of write buffer size is 64 MB [1]. However, the java doc for this value is not correct [2]. Already created a PR to fix this. [1] https://github.com/facebook/rocksdb/blob/30edf1874c11762a6cacf4434112ce34d13100d3/include/rocksdb/options.h#L191 [2] https://gith

Re: Dynamically allocating right-sized task resources

2019-08-05 Thread Xintong Song
Hi Chad, If I understand correctly, the scenarios you talked about are running batch jobs, right? At the moment (Flink 1.8 and earlier), Flink does not differentiate different working load of tasks. It uses a slot-sharing approach[1] to balance workloads among workers. The general idea is to put

An ArrayIndexOutOfBoundsException after a few message with Flink 1.8.1

2019-08-05 Thread Nicolas Lalevée
Hi, I have got a weird error after a few messages. I have first seen this error on a deployed Flink cluster 1.7.1. Trying to figure it out, I am trying with a local Flink 1.8.1. I still get this ArrayIndexOutOfBoundsException. I don't have a precise scenario to reproduce it, but it is happening

Re: getting an exception

2019-08-05 Thread Gaël Renoux
Hi Avi and Victor, I just opened this ticket on JIRA: https://issues.apache.org/jira/browse/FLINK-13586 (I hadn't seen these e-mails). Backward compatibility is broken between 1.8.0 and 1.8.1 if you use Kafka connectors. Can you upgrade your flink-connector-kafka dependency to 1.8.1 ? It won't de

Re: getting an exception

2019-08-05 Thread Wong Victor
Hi Avi: It seems you are submitting your job with an older Flink version (< 1.8), please check your flink-dist version. Regards, Victor From: Avi Levi Date: Monday, August 5, 2019 at 9:11 PM To: user Subject: getting an exception Hi, I'm using Flink 1.8.1. our code is mostly using Scala. Whe

StreamingFileSink not committing file to S3

2019-08-05 Thread Ravi Bhushan Ratnakar
Thanks for your quick response. I am using custom implementation of BoundedOutOfOrderenessTimestampExtractor and also tweaked to return initial watermark not a negative value. One more observation that, when the job's parallelism is around 120, then it works well even with idle stream and Flink U

Re: [DataStream API] Best practice to handle custom session window - time gap based but handles event for ending session

2019-08-05 Thread Fabian Hueske
Hi Jungtaek, I would recommend to implement the logic in a ProcessFunction and avoid Flink's windowing API. IMO, the windowing API is difficult to use, because there are many pieces like WindowAssigner, Window, Trigger, Evictor, WindowFunction that are orchestrated by Flink. This makes it very har

getting an exception

2019-08-05 Thread Avi Levi
Hi, I'm using Flink 1.8.1. our code is mostly using Scala. When I try to submit my job (on my local machine ) it crashes with the error below (BTW on the IDE it runs perfectly). Any assistance would be appreciated. Thanks Avi 2019-08-05 12:58:03.783 [Flink-DispatcherRestEndpoint-thread-3] ERROR or

Re: Memory constrains running Flink on Kubernetes

2019-08-05 Thread wvl
Btw, with regard to: > The default writer-buffer-number is 2 at most for each column family, and the default write-buffer-memory size is 4MB. This isn't what I see when looking at the OPTIONS-XX file in the rocksdb directories in state: [CFOptions "xx"] ttl=0 report_bg_io_stats=false

Re: some confuse for data exchange of parallel operator

2019-08-05 Thread Biao Liu
Hi Kylin, > Can this map record all data? Or this map only record data from one parallelism of upstream operator? Neither of your guess is correct. It depends on the partitioner between the map operator and upstream operator. You could find more in this document [1]. 1. https://ci.apache.org/proj

Re: StreamingFileSink not committing file to S3

2019-08-05 Thread Theo Diefenthal
Hi Ravi, Please checkout [1] and [2]. That is related to Kafka but probably applies to Kinesis as well. If one stream is empty, there is no way for Flink to know about the watermark of that stream and Flink can't advance the watermark. Following downstream operators can thus not know if there

some confuse for data exchange of parallel operator

2019-08-05 Thread tangkailin
Hello, I don’t know how parallel operator exchange data in flink. for example, I define a map in a operator with n (n > 1) parallelism for counting. Can this map record all data? Or this map only record data from one parallelism of upstream operator? Thanks, Kylin