Re: Poor performance with large keys using RocksDB and MapState

2020-10-01 Thread Yun Tang
Hi The option of 'setCacheIndexAndFilterBlocks' is used to ensure we could manage the memory usage of RocksDB, could you share logs or more descriptions why setCacheIndexAndFilterBlocks seems to make the hash index not work properly? I guess this might due to the index and filter is more

Re: [DISCUSS] FLIP-144: Native Kubernetes HA for Flink

2020-10-01 Thread Yang Wang
3. Make sense to me. And we could add a new HA solution "StatefulSet + PV + FileSystem" at any time if we need in the future. Since there are no more open questions, I will start the voting now. Thanks all for your comments and feedback. Feel feel to continue the discussion if you get other

Re: Flink 1.12 snapshot throws ClassNotFoundException

2020-10-01 Thread Lian Jiang
Thanks Till. Making the scala version consistent using 2.11 solved the ClassNotFoundException. On Tue, Sep 29, 2020 at 11:58 PM Till Rohrmann wrote: > Hi Lian, > > I suspect that it is caused by an incompatible Akka version. Flink uses > Akka 2.5.21 instead of 2.5.12. Moreover, you are mixing

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-01 Thread Austin Cawley-Edwards
Hey Till, Just a quick question on time characteristics -- this should work for IngestionTime as well, correct? Is there anything special I need to do to have the CsvTableSource/ toRetractStream call to carry through the assigned timestamps, or do I have to re-assign timestamps during the

RE: Blobserver dying mid-application

2020-10-01 Thread Hailu, Andreas
@Chesnay: I see. I actually had a separate thread with Robert Metzger ago regarding connection-related issues we’re plagued with at higher parallelisms, and his guidance lead us to look into our somaxconn config. We increased it from 128 to 1024 in early September. We use the same generic JAR

Re: Experiencing different watermarking behaviour in local/standalone modes when reading from kafka topic with idle partitions

2020-10-01 Thread David Anderson
If you were to use per-partition watermarking, which you can do by calling assignTimestampsAndWatermarks directly on the Flink Kafka consumer [1], then I believe the idle partition(s) would consistently hold back the overall watermark. With per-partition watermarking, each Kafka source task will

Re: Blobserver dying mid-application

2020-10-01 Thread Chesnay Schepler
All jobs running in a Flink session cluster talk to the same blob server. The time when tasks are submitted depends on the job; for streaming jobs all tasks are deployed when the job starts running; in case of batch jobs the submission can be staggered. I'm only aware of 2 cases where we

Re: Stateful Functions + ML model prediction

2020-10-01 Thread John Morrow
Hi Flink Users, I was watching Tzu-Li Tai's talk on stateful functions from Flink Forward (https://www.youtube.com/watch?v=tuSylBadNSo) which mentioned that Kafka & Kinesis are supported, and looking at https://repo.maven.apache.org/maven2/org/apache/flink/ I can see IO packages for those

Re: flinksql注册udtf使用ROW类型做为输出输出时出错

2020-10-01 Thread Xingbo Huang
Hello, 这个算是个易用性的问题,我之前有创建了JIRA[1]。你现在直接用[DataTypes.STRING(), DataTypes.STRING()]作resultType就是对的。关于input_types那个问题,实际上input_types在内部是通过上游的result_type匹配得出来的,所以你这里没对应也是对的,1.12版本将不再需要指定result_type了。 Best, Xingbo [1] https://issues.apache.org/jira/browse/FLINK-19138 chenxuying 于2020年9月30日周三

Experiencing different watermarking behaviour in local/standalone modes when reading from kafka topic with idle partitions

2020-10-01 Thread Salva Alcántara
I am considering this watermarker: ```scala class MyWatermarker(val maxTimeLag: Long = 0) extends AssignerWithPeriodicWatermarks[MyEvent] { var maxTs: Long = 0 override def extractTimestamp(e: MyEvent, previousElementTimestamp: Long): Long = { val timestamp = e.timestamp maxTs =

Re: Blobserver dying mid-application

2020-10-01 Thread Chesnay Schepler
It would also be good to know how many slots you have on each task executor. On 10/1/2020 11:21 AM, Till Rohrmann wrote: Hi Andreas, do the logs of the JM contain any information? Theoretically, each task submission to a `TaskExecutor` can trigger a new connection to the BlobServer. This

Re: Blobserver dying mid-application

2020-10-01 Thread Till Rohrmann
Hi Andreas, do the logs of the JM contain any information? Theoretically, each task submission to a `TaskExecutor` can trigger a new connection to the BlobServer. This depends a bit on how large your TaskInformation is and whether this information is being offloaded to the BlobServer. What you

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-01 Thread Till Rohrmann
Hi Austin, I believe that the problem is the processing time window. Unlike for event time where we send a MAX_WATERMARK at the end of the stream to trigger all remaining windows, this does not happen for processing time windows. Hence, if your stream ends and you still have an open processing

Re: [DISCUSS] FLIP-144: Native Kubernetes HA for Flink

2020-10-01 Thread Till Rohrmann
3. We could avoid force deletions from within Flink. If the user does it, then we don't give guarantees. I am fine with your current proposal. +1 for moving forward with it. Cheers, Till On Thu, Oct 1, 2020 at 2:32 AM Yang Wang wrote: > 2. Yes. This is exactly what I mean. Storing the HA