Re: Datadog reporter timeout & OOM issue

2021-01-26 Thread Juha Mynttinen
Hey, A few months back, I had a very similar problem with Datadog when I tried to do a proof of concept using it with Flink. I had quite a lot of user defined metrics. I got similar exceptions and the metrics didn't end up in Datadog. Without too much deeper analysis, I assumed Datadog was throttl

Performance issue associated with managed RocksDB memory

2020-06-24 Thread Juha Mynttinen
Hello there, In Flink 1.10 the configuration parameter state.backend.rocksdb.memory.managed defaults to true. This is great, since it makes it simpler to stay within the memory budget e.g. when running in a container environment. However, I've noticed performance issues when the switch is enabl

Re: Performance issue associated with managed RocksDB memory

2020-06-24 Thread Juha Mynttinen
Hey, Here's a simple test. It's basically the WordCount example from Flink, but using RocksDB as the state backend and having a stateful operator. The javadocs explain how to use it. /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See

Re: Performance issue associated with managed RocksDB memory

2020-06-25 Thread Juha Mynttinen
CTORY, "HEAP"); final StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(PARALLELISM, configuration); No changes in the performance (tried with parallelism 5 and without managed memory). Regards, Juha From: Yu Li Sent: Thursday, June 25, 2020 12:20

Re: Performance issue associated with managed RocksDB memory

2020-06-25 Thread Juha Mynttinen
Andrey, A small clarification. The tweaked WordCount I posted earlier doesn't illustrate the issue I originally explained, i.e. the one where there's a bigger operator and a smallest possible windows operator. Instead, the modified WordCount illustrates the degraded performance of a very simple Fl

Re: Performance issue associated with managed RocksDB memory

2020-08-24 Thread Juha Mynttinen
The issue can be reproduced by using a certain combinations of the value of RocksDBOptions.WRITE_BUFFER_RATIO (default 0.5) and the Flink job parallelism. Examples that break: * Parallelism 1 and WRITE_BUFFER_RATIO 0.1 * Parallelism 5 and the default WRITE_BUFFER_RATIO 0.5 Examples that work: * P

Re: Performance issue associated with managed RocksDB memory

2020-09-08 Thread Juha Mynttinen
ed to see if it was possible to check the sanity of the arena block size and just make the app crash if the arena block size is too high (or the mutable limit too low). I came up with this https://github.com/juha-mynttinen-king/flink/tree/arena_block_sanity_check. The code calculates the same pa

Re: Performance issue associated with managed RocksDB memory

2020-09-09 Thread Juha Mynttinen
k size decreasing example in the docs. Also, the default managed memory size is AFAIK 128MB right now. That could be increased. That would get rid of this issue in many cases. Regards, Juha ____ From: Yun Tang Sent: Tuesday, September 8, 2020 8:05 PM To: Juha My

Re: Performance issue associated with managed RocksDB memory

2020-09-10 Thread Juha Mynttinen
Hey I've fixed the code (https://github.com/juha-mynttinen-king/flink/commits/arena_block_sanity_check) slightly. Now it WARNs if there is the memory configuration issue. Also, I think there was a bug in the way the check calculated the mutable memory, fixed that. Also, wrote some test

Re: Performance issue associated with managed RocksDB memory

2020-09-15 Thread Juha Mynttinen
Hey I created this one https://issues.apache.org/jira/browse/FLINK-19238. Regards, Juha From: Yun Tang Sent: Tuesday, September 15, 2020 8:06 AM To: Juha Mynttinen ; Stephan Ewen Cc: user@flink.apache.org Subject: Re: Performance issue associated with managed

Disable WAL in RocksDB recovery

2020-09-15 Thread Juha Mynttinen
Hello there, I'd like to bring to discussion a previously discussed topic - disabling WAL in RocksDB recovery. It's clear that WAL is not needed during the process, the reason being that the WAL is never read, so there's no need to write it. AFAIK the last thing that was done with WAL during r

Re: Disable WAL in RocksDB recovery

2020-09-21 Thread Juha Mynttinen
Good, I opened this JIRA for the issue https://issues.apache.org/jira/browse/FLINK-19303. The discussion can be moved there. Regards, Juha From: Yu Li Sent: Friday, September 18, 2020 3:58 PM To: Juha Mynttinen Cc: user@flink.apache.org Subject: Re: Disable

Building Flink on VirtualBox VM failing

2020-10-19 Thread Juha Mynttinen
Hey, I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152. The command I'm using: apache-maven-3.2.5/bin/mvn clean verify The output: [INFO] Flin

Re: Building Flink on VirtualBox VM failing

2020-10-19 Thread Juha Mynttinen
t; > Regards, > Roman > > > On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen > wrote: > >> >> Hey, >> >> I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in >> a virtual machine on Windows 10. I'm using OpenJDK 11.0.8.

Re: Building Flink on VirtualBox VM failing

2020-10-20 Thread Juha Mynttinen
gh parallelism). > So I think the best way to deal with this is to use VM with more memory. > > Regards, > Roman > > > On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen > wrote: > >> Hey, >> >> Good hint that /var/log/kern.log. This time I can see this

Re: Building Flink on VirtualBox VM failing

2020-10-21 Thread Juha Mynttinen
x27;t crash (probably, because of overcommit). > Did you try this approach in your VM? > > Regards, > Roman > > > On Tue, Oct 20, 2020 at 12:12 PM Juha Mynttinen > wrote: > >> Hey, >> >> > Currently, tests do not run in parallel >> >> I d

Re: Building Flink on VirtualBox VM failing

2020-10-21 Thread Juha Mynttinen
ory: Killed process 1220764 (java) total-vm:8514092kB, anon-rss:4116292kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:9136kB oom_score_adj:0 Oct 21 12:21:57 ubuntu kernel: [24024.685821] oom_reaper: reaped process 1220764 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Regards, Juha El mié

mvn clean verify - testConfigurePythonExecution failing

2020-10-22 Thread Juha Mynttinen
Hello there, The PR https://github.com/apache/flink/pull/13322 lately added the test method testConfigurePythonExecution in org.apache.flink.client.cli.PythonProgramOptionsTest. "mvn clean verify" fails for me in testConfigurePythonExecution: ... INFO] Running org.apache.flink.client.cli.Pytho

Logging when building and testing Flink

2020-10-23 Thread Juha Mynttinen
Hey there, I noticed that when building and testing Flink itself, logging seems to be non-existing or very quiet. I had a look at the logging conf files (such as flink-tests/src/test/resources/log4j2-test.properties) and the pattern seems to be that the logging is turned off in tests. At least it

Re: Building Flink on VirtualBox VM failing

2020-10-23 Thread Juha Mynttinen
t 23 15:26:42 ubuntu kernel: [23021.406205] oom_reaper: reaped process 460994 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB It seems very odd to me that the process takes 13277940kB of virtual mem and 9763848kB of anon-rss. Or maybe I'm reading something wrong. r, Juha El mié., 21 oc

Re: Logging when building and testing Flink

2020-10-30 Thread Juha Mynttinen
st pass the log4j.configurationFile > explicitly: > > mvn '-Dlog4j.configurationFile=[path]/log4j2-on.properties' clean install > > Best, > > Dawid > > On 23/10/2020 09:48, Juha Mynttinen wrote: > > Hey there, > > > > I noticed that when building and testin

Re: Get current kafka offsets for kafka sources

2020-12-15 Thread Juha Mynttinen
Hey, Have a look at [1]. Basically, you won't see the "real-time" consumer group offsets stored in Kafka itself, but only the ones the Flink Kafka consumer stores there when checkpointing (assuming you have checkpointing enabled). The same information is available in Flink metrics [2], "committedO

REST API in an HA setup - must the leading JM be called?

2021-08-18 Thread Juha Mynttinen
I have questions related to REST API in the case of ZooKeeper HA and a standalone cluster. But I think the questions apply to other setups too such as YARN. Let's assume a standalone cluster with multiple JobManagers. The JobManagers elect the leader among themselves and register that to ZooKeeper

Re: REST API in an HA setup - must the leading JM be called?

2021-08-18 Thread Juha Mynttinen
). > > This does indeed imply that on JM failover all this information is lost. > > There are ideas to solve is, but no concrete timeline. See > https://issues.apache.org/jira/browse/FLINK-18312 > > On 18/08/2021 11:54, Juha Mynttinen wrote: > > I have questions relat

web.timeout usage in the code

2021-08-24 Thread Juha Mynttinen
Hey, I was looking at the web.timeout configuration option described as "Timeout for asynchronous operations by the web monitor in milliseconds". I'm interested in where and how it's used internally. I've understood it's the timeout when the Web UI calls something using Rpc (Akka). Correct? Unfor

Re: web.timeout usage in the code

2021-08-24 Thread Juha Mynttinen
ut. The AkkaInvocationHandler extracts the timeout based on > this annotation and uses it internally. > > On 24/08/2021 12:05, Juha Mynttinen wrote: > > Hey, > > I was looking at the web.timeout configuration option described as > "Timeout for asynchronous operations by t