Re: Flink job failure during yarn node termination

2021-08-04 Thread Rainie Li
Thanks Till. We terminated one of the worker nodes. We enabled HA by using Zookeeper. Sure, we will try upgrade job to newer version. Best regards Rainie On Tue, Aug 3, 2021 at 11:57 PM Till Rohrmann wrote: > Hi Rainie, > > It looks to me as if Yarn is causing this problem. Which Yarn node are

Re: Bloom Filter - RocksDB - LinkageError Classloading

2021-08-04 Thread Yun Tang
Hi Sandeep, Did you include the RocksDB classes in the application jar package? You can unpark your jar package to check whether them existed. If so, since RocksDB classes are already included in the flink-dist package, you don't need to include them in your jar package (maybe you explicitly add

Re: Checkpoints fail when trying to copy from Kafka to S3 since "some tasks have been finished"

2021-08-04 Thread Svend
Hi Robert, Thanks for the feed-back. You are correct, the behavior is indeed different: when I make the source bounded, the application eventually stops whereas without that setting it runs forever. In both cases neither checkpoints nor data is being written to the filesystem. I re-ran the e

Re: Bloom Filter - RocksDB - LinkageError Classloading

2021-08-04 Thread Stephan Ewen
@Yun Tang Does it make sense to add RocksDB to the "always parent-first options" to avoid these kind of errors when users package apps incorrectly? My feeling is that these packaging errors occur very frequently. On Wed, Aug 4, 2021 at 10:41 AM Yun Tang wrote: > Hi Sandeep, > > Did you include

[ANNOUNCE] RocksDB Version Upgrade and Performance

2021-08-04 Thread Stephan Ewen
Hi all! *!!! If you are a big user of the Embedded RocksDB State Backend and have performance sensitive workloads, please read this !!!* I want to quickly raise some awareness for a RocksDB version upgrade we plan to do, and some possible impact on application performance. *We plan to upgrade R

Re: Topic assignment across Flink Kafka Consumer

2021-08-04 Thread Prasanna kumar
Robert, Flink version 1.12.2. Flink connector Kafka Version 2..12 The partitions are assigned equally if we are reading from a single topic. Our Use case is to read from multiple topics [topics r4 regex pattern] we use 6 topics and 1 partition per topic for this job. In this case , few of the k

Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

2021-08-04 Thread David Anderson
I am hearing quite often from users who are struggling to manage memory usage, and these are all users using RocksDB. While I don't know for certain that RocksDB is the cause in every case, from my perspective, getting the better memory stability of version 6.20 in place is critical. Regards, Davi

Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

2021-08-04 Thread Yuval Itzchakov
We are heavy users of RocksDB and have had several issues with memory managed in Kubernetes, most of them actually went away when we upgraded from Flink 1.9 to 1.13. Do we know why there's such a huge performance regression? Can we improve this somehow with some flag tweaking? It would be great if

Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

2021-08-04 Thread Stephan Ewen
For the details of what causes this regression, I would add @Yun Tang to this discussion. On Wed, Aug 4, 2021 at 2:36 PM Yuval Itzchakov wrote: > We are heavy users of RocksDB and have had several issues with memory > managed in Kubernetes, most of them actually went away when we upgraded > fro

custom flink image error

2021-08-04 Thread Joshua Fan
Hi All I want to build a custom flink image to run on k8s, below is my Dockerfile content: > FROM apache/flink:1.13.1-scala_2.11 > ADD ./flink-s3-fs-hadoop-1.13.1.jar /opt/flink/lib > ADD ./flink-s3-fs-presto-1.13.1.jar /opt/flink/lib > I just put the s3 fs dependency to the {flink home}/lib, and

Re: Topic assignment across Flink Kafka Consumer

2021-08-04 Thread Prasanna kumar
Robert When we apply a rebalance method to the kafka consumer, it is assigning partitions of various topics evenly. But my only concern is that the rebalance method might have a performance impact . Thanks, Prasanna. On Wed, Aug 4, 2021 at 5:55 PM Prasanna kumar wrote: > Robert, > > Flink ve

Re: Using event time with Python DataStreamAPI

2021-08-04 Thread Ignacio Taranto
That's what I thought Dian. The problem is that setting the watermark strategy like that didn't work either, the method on_event_time is never called. I did some reading of https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/event-time/generating_watermarks/#watermark

Re: Savepoint class refactor in 1.11 causing restore from 1.9 savepoint to fail

2021-08-04 Thread Robert Metzger
Hi Weston, Oh indeed, you are right! I quickly tried restoring a 1.9 savepoint on a 1.11 runtime and it worked. So in principle this seems to be supported. I'm including Timo into this thread, he has a lot of experience with the serializers. On Tue, Aug 3, 2021 at 6:59 PM Weston Woods wrote: >

Re: Topic assignment across Flink Kafka Consumer

2021-08-04 Thread Robert Metzger
Hi Prasanna, How are you checking the assignment of Kafka partitions to the consumers? The FlinkKafkaConsumer doesn't have a rebalance() method, this is a generic concept of the DataStream API. Is it possible that you are somehow partitioning your data in your Flink job, and this is causing the d

Re: Topic assignment across Flink Kafka Consumer

2021-08-04 Thread Prasanna kumar
Robert We are checking using the metric flink_taskmanager_job_task_operator_KafkaConsumer_assigned_partitions{jobname="SPECIFICJOBNAME"} This metric gives the number of partitions assigned to each task(kafka consumer operator). Prasanna. On Wed, Aug 4, 2021 at 8:59 PM Robert Metzger wrote: >

Re: custom flink image error

2021-08-04 Thread Robert Metzger
Hey Joshua, Can you first validate if the docker image you've built is valid by running it locally on your machine? I would recommend putting the s3 filesystem files into the plugins [1] directory to avoid classloading issues. Also, you don't need to build custom images if you want to use build-i

Re: FW: Error using KafkaProducer EXACTLY_ONCE semantic + TaskManager Failure

2021-08-04 Thread Kevin Lam
Thanks Matthias. I just tried this backport (https://github.com/apache/flink/pull/16693) and got the following error, with the reproduction I described in https://lists.apache.org/thread.html/r528102e08d19d3ae446e5df75710009128c736735c0aaf310f95abeb%40%3Cuser.flink.apache.org%3E (ie. starting job

Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

2021-08-04 Thread Nico Kruber
That's actually also what I'm seeing most of the time and what I'd expect to improve with the newer RocksDB version. Hence, I'd also favour the upgrade even if there is a slight catch with respect to performance - we should, however, continue to investigate this together with the RocksDB communi

How to use RichAsyncFunction with Test harness in flink 1.13.1?

2021-08-04 Thread Debraj Manna
HI I am trying to use RichAsyncFunction with flink's test harness. My code looks like below final MetadataEnrichment.AsyncFlowLookup fn = new MetadataEnrichment.AsyncFlowLookup(); final AsyncWaitOperatorFactory> operator = new AsyncWaitOperatorFactory<>(fn, 2000, 1, AsyncDataStream.OutputMode.ORD

Re: FW: Error using KafkaProducer EXACTLY_ONCE semantic + TaskManager Failure

2021-08-04 Thread Arvid Heise
Hi Kevin, Which Kafka client version are you using? (=What is effectively bundled into your application jar?) On Wed, Aug 4, 2021 at 5:56 PM Kevin Lam wrote: > Thanks Matthias. > > I just tried this backport (https://github.com/apache/flink/pull/16693) > and got the following error, with the re

Re: How to use RichAsyncFunction with Test harness in flink 1.13.1?

2021-08-04 Thread Arvid Heise
I would strongly recommend not using the harness for testing user functions. Instead I'd just create an ITCase like this: @ClassRule public static final MiniClusterWithClientResource MINI_CLUSTER = new MiniClusterWithClientResource( new MiniClusterResourceConfiguration.Bui

Re: Flink job failure during yarn node termination

2021-08-04 Thread Rainie Li
Thanks for the context Nicolaus. We are using S3 instead of HDFS. Best regards Rainie On Wed, Aug 4, 2021 at 12:39 AM Nicolaus Weidner < nicolaus.weid...@ververica.com> wrote: > Hi Rainie, > > I found a similar issue on stackoverflow, though quite different > stacktrace: > https://stackoverflow.

Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

2021-08-04 Thread Yun Tang
Hi Yuval, Upgrading RocksDB version is a long story since Flink-1.10. When we first plan to introduce write buffer manager to help control the memory usage of RocksDB, we actually wanted to bump up to RocksDB-5.18 from current RocksDB-5.17. However, we found performance regression in our micro b

Custom Source with the new Data Source API

2021-08-04 Thread Xinbin Huang
Hi team, I'm trying to develop a custom source using the new Data Source API but I have some hard time finding examples for it. Can you point me to some existing Sources implemented with the new Data Source API? It would be ideal if source is for a pull-based unbound source (i.e. Kafka). Thanks!

Re: Custom Source with the new Data Source API

2021-08-04 Thread Yuval Itzchakov
Hi Bin, Flinks Kafka source has been rewritten using the new Source API. You can find it here: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/KafkaSource.java On Wed, Aug 4, 2021 at 8:51 PM Xinbin Huang wro

Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

2021-08-04 Thread Yuval Itzchakov
Hi Yun, Thank you for the elaborate explanation and even more so for the super hard work that you're doing digging into RocksDB and chasing after hundreds of commits in order to fix them so we can all benefit. I can say for myself that optimizing towards memory is more important ATM for us, and I'

Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

2021-08-04 Thread Piotr Nowojski
Thanks for the detailed explanation Yun Tang and clearly all of the effort you have put into it. Based on what was described here I would also vote for going forward with the upgrade. It's a pity that this regression wasn't caught in the RocksDB community. I would have two questions/ideas: 1. Can

Understanding the semantics of SourceContext.collect

2021-08-04 Thread Yuval Itzchakov
Hi, I have a question regarding the semantics of event processing from a source downstream that I want to clarify. I have a custom source which offloads data from our data warehouse. In my custom source, I have some state which keeps track of the latest timestamps that were read. When unloading,

Re: Flink job failure during yarn node termination

2021-08-04 Thread Rainie Li
Hi Nicolaus, I double checked again our hdfs config, it is setting 1 instead of 2. I will try the solution you provided. Thanks again. Best regards Rainie On Wed, Aug 4, 2021 at 10:40 AM Rainie Li wrote: > Thanks for the context Nicolaus. > We are using S3 instead of HDFS. > > Best regards > R

Reusing savepoints from a streaming job in a batch job

2021-08-04 Thread tobias.schulze
He folks, This is a crosspost of a stack overflow question (https://stackoverflow.com/questions/68631624/flink-job-cant-use-savepoint-in-a-batch-job) which didn’t get any replies yet so please bare with me. Let me start in a generic fashion to see if I somehow missed some concepts: I have a st

Re: Checkpoints fail when trying to copy from Kafka to S3 since "some tasks have been finished"

2021-08-04 Thread Svend
Hi again, After a bit of experimentation (and actually reading the bug report I linked), I realized the issue was that the parallelism was higher than the number of Kafka partitions => reducing the parallelism enabled the checkpoints to work as expected. => since it seems unsupported, maybe K

Re: FW: Error using KafkaProducer EXACTLY_ONCE semantic + TaskManager Failure

2021-08-04 Thread Kevin Lam
Hi Arvid, I had 5.5.2 bundled into my application jar. I was able to get the https://github.com/apache/flink/pull/16693 working by ensuring that kafka-clients==2.4.1 was used just now. Thanks!! On Wed, Aug 4, 2021 at 1:04 PM Arvid Heise wrote: > Hi Kevin, > > Which Kafka client version are you

Is FlinkKafkaConsumer setStartFromLatest() method needed when we use auto.offset.reset=latest kafka properties

2021-08-04 Thread suman shil
In my flink streaming application I have kafka datasource. I am using the kafka property auto.offset.reset=latest. I am wondering if I need to use FlinkKafkaConsumer.setStartFromLatest(). Are they similar? Can I use either of them? Following is the documentation from flink code. /** * Specifies th

Avro SpecificRecordBase question

2021-08-04 Thread Kirill Kosenko
Hello I read in this article https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html that it's possible to use SpecificRecordBase.class in the operators: Avro Specific Avro specific records will be automatically detected by checking that the given type’s type hierarchy c

Re: Understanding the semantics of SourceContext.collect

2021-08-04 Thread Caizhi Weng
Hi! There is no such guarantee unless the whole DAG is a single node. Flink's runtime runs the same node (task) in the same thread, while different nodes (tasks) are executed in different threads, even in different machines. Yuval Itzchakov 于2021年8月5日周四 上午2:10写道: > Hi, > > I have a question reg

Re: Understanding the semantics of SourceContext.collect

2021-08-04 Thread Caizhi Weng
Oh, and in batch jobs even if the whole DAG is a single node this is not guaranteed. For example, for a sort operator the record will be stored in memory or on disk and only after all records have arrived will these records be sorted and sent to the downstream. So the state in your ASO is still nee

Re: custom flink image error

2021-08-04 Thread Joshua Fan
Hi Robert, Tobias I have tried many ways to build and validate the image. 1.put the s3 dependency to plugin subdirectory, the Dockerfile content is below: > FROM apache/flink:1.13.1-scala_2.11 > ADD ./flink-s3-fs-hadoop-1.13.1.jar > /opt/flink/plugins/s3-hadoop/flink-s3-fs-hadoop-1.13.1.jar > AD

Implement task local recovery on TaskManager restart for Signifyd

2021-08-04 Thread Colman OSullivan
Hello! At Signifyd we use machine learning to protect our customers from credit card fraud. Efficiently calculating feature values for our models based on historical data is one of the primary challenges we face, and we’re meeting it with Flink. We need our system to be highly available and quick

Re: custom flink image error

2021-08-04 Thread Joshua Fan
It seems I set a wrong high-availability.storageDir, s3://flink-test/recovery can work, but s3:///flink-test/recovery can not, one / be removed. Joshua Fan 于2021年8月5日周四 上午10:43写道: > Hi Robert, Tobias > > I have tried many ways to build and validate the image. > > 1.put the s3 dependency to plu