Flink 1.13.3, k8s HA - ResourceManager was revoked leadership

2021-12-10 Thread Alexey Trenikhun
Hello, I'm running Flink 1.13.3 with Kubernetes HA. JM periodically restarts after some time, in log below job runs ~8 minutes, then suddenly leadership was revoked, job reaches terminal state and K8s restarts failed JM: {"timestamp":"2021-12-11T04:51:53.697Z","message":"Agent Info (1/1) (47e67

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-10 Thread narasimha
Folks, what about the veverica platform. Is there any mitigation around it? On Fri, Dec 10, 2021 at 3:32 PM Chesnay Schepler wrote: > I would recommend to modify your log4j configurations to set > log4j2.formatMsgNoLookups to true*.* > > As far as I can tell this is equivalent to upgrading log4j

Passing arbitrary Hadoop s3a properties from FileSystem SQL Connector options

2021-12-10 Thread Timothy James
Hi, The Hadoop s3a library itself supports some properties we need, but the "FileSystem SQL Connector" (via FileSystemTableFactory) does not pass connector options for these to the "Hadoop/Presto S3 File Systems plugins" (via S3FileSystemFactory). Instead, only Job-global Flink config values are

Re: broadcast() without arguments

2021-12-10 Thread Alexey Trenikhun
Thank you Roman From: Roman Khachatryan Sent: Friday, December 10, 2021 1:48:08 AM To: Alexey Trenikhun Cc: Flink User Mail List Subject: Re: broadcast() without arguments Hello, The broadcast() without arguments can be used the same way as a regular data strea

Latency monitoring in Flink 1.14.0

2021-12-10 Thread Geldenhuys, Morgan Karl
Greetings all, I am attempting to setup latency monitoring for a flink 1.14.0 job. According to the documentation, I have done the following: In my kubernetes setup I have added the following to the kubernetes-sess

Confusion about rebalance bytes sent metric in Flink UI

2021-12-10 Thread tao xiao
Hi team, I have one operator that is connected to another 9 downstream operators using rebalance. Each operator has 150 parallelisms[1]. I assume each message in the upstream operation is sent to one of the parallel instances of the 9 receiving operators so the total bytes sent should be roughly 9

Advise on Apache Log4j Zero Day (CVE-2021-44228)

2021-12-10 Thread Konstantin Knauf
Dear Flink Community, Yesterday, a new Zero Day for Apache Log4j was reported [1]. It is now tracked under CVE-2021-44228 [2]. Apache Flink bundles a version of Log4j that is affected by this vulnerability. We recommend users to follow the advisory [3] of the Apache Log4j Community. For Apache Fl

WindowOperator TestHarness

2021-12-10 Thread Lars Skjærven
Hello, We're trying to write a test for an implementation of *AggregateFunction* following a *EventTimeSessionWindows.withGap*. We gave it a try using *WindowOperator*() which we hoped could be used as an argument to *KeyedOneInputStreamOperatorTestHarness*. We're a bit stuck, and we're hoping som

Re: FileSource with Parquet Format - parallelism level

2021-12-10 Thread Arvid Heise
Yes, Parquet files can be read in splits (=in parallel). Which enumerator is used is determined here [1]. [1] https://github.com/apache/flink/blob/master/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetVectorizedInputFormat.java#L170-L170 On Fri, Dec 10, 2021 at

Re: Regarding the size of Flink cluster

2021-12-10 Thread Timo Walther
Hi Jessy, let me try to answer some of your questions. > 16 Task Managers with 1 task slot and 1 CPU each Every additional task manager also involves management overhead. So I would suggest option 1. But in the end you need to perform some benchmarks yourself. I could also imagine that a mixt

Regarding the size of Flink cluster

2021-12-10 Thread Jessy Ping
Hi All, I have the following questions regarding the sizing of the Flink cluster doing stateful computation using Datastream API. It will be better if the community can answer the below questions or doubts. Suppose we have a pipeline as follows, *Kafka real time events source1 & Kafka rules

Re: FileSource with Parquet Format - parallelism level

2021-12-10 Thread Krzysztof Chmielewski
Hi Roman, Thank you. I'm familiar with FLIP-27 and I was analyzing the new File Source. >From there I saw that there are two FileEnumerators -> one that allows for file split and other that does not. BlockSplittingRecursiveEnumerator and NonSplittingRecursiveEnumerator. I was wondering if BlockS

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-10 Thread 刘建刚
Glad to see the suggestion. In our test, we found that small jobs with the changing configs can not improve the performance much just as your test. I have some suggestions: - The config can affect the memory usage. Will the related memory configs be changed? - Can you share the tpcds resu

PyFlink accumulate streaming data

2021-12-10 Thread Королькевич Михаил
Hello flink team! How to properly accumulate streaming data into the avro file partition by the hour. My current implementation data from the data stream is converted to a table and it is saved in an avro file.Similar to this:     t_env.execute_sql("""            CREATE TABLE mySink (             

Re: broadcast() without arguments

2021-12-10 Thread Roman Khachatryan
Hello, The broadcast() without arguments can be used the same way as a regular data stream, i.e. regular transformations can be applied to it. The difference is that every element will be sent to all downstream subtasks and not just one. The difference with broadcast() with arguments is that the

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-10 Thread Chesnay Schepler
I would recommend to modify your log4j configurations to set log4j2.formatMsgNoLookups to true/./ / / As far as I can tell this is equivalent to upgrading log4j, which just disabled this lookup by default. / / On 10/12/2021 10:21, Richard Deurwaarder wrote: Hello, There has been a log4j2 vuln

CVE-2021-44228 - Log4j2 vulnerability

2021-12-10 Thread Richard Deurwaarder
Hello, There has been a log4j2 vulnerability made public https://www.randori.com/blog/cve-2021-44228/ which is making some waves :) This post even explicitly mentions Apache Flink: https://securityonline.info/apache-log4j2-remote-code-execution-vulnerability-alert/ And fortunately, I saw this was

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-10 Thread Yun Gao
Hi Yingjie, Very thanks for drafting the FLIP and initiating the discussion! May I have a double confirmation for taskmanager.network.sort-shuffle.min-parallelism that since other frameworks like Spark have used sort-based shuffle for all the cases, does our current circumstance still have dif

Re: Hybrid Source with Parquet Files from GCS + KafkaSource

2021-12-10 Thread Roman Khachatryan
Hi, Have you tried constructing a Hybrid source from a File source created with FileSource.forBulkFileFormat [1] and "gs://bucket" scheme [2] directly? [1] https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/file/src/FileSource.html#forBulkFileFormat-org.apach

Re: FileSource with Parquet Format - parallelism level

2021-12-10 Thread Roman Khachatryan
Hi, Yes, file source does support DoP > 1. And in general, a single file can be read in parallel after FLIP-27. However, parallel reading of a single Parquet file is currently not supported AFAIK. Maybe Arvid or Fabian could shed more light here. Regards, Roman On Thu, Dec 9, 2021 at 12:03 PM K

Re: stateSerializer(1.14.0) not compatible with previous stateSerializer(1.13.1)

2021-12-10 Thread Roman Khachatryan
Hi, Compatibility might depend on specific serializers, could you please share which serializers you use to access the state? Regards, Roman On Fri, Dec 10, 2021 at 3:41 AM 李诗君 wrote: > > I am trying to upgrade my flink cluster version from 1.13.1 to 1.14.0 , I did > like below steps: > > 1. s

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-10 Thread Yingjie Cao
Hi dev & users: I have created a FLIP [1] for it, feedbacks are highly appreciated. Best, Yingjie [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-199%3A+Change+some+default+config+values+of+blocking+shuffle+for+better+usability Yingjie Cao 于2021年12月3日周五 17:02写道: > Hi dev & users, >