Re: Checkpoint metadata deleted by Flink after ZK connection issues

2020-09-09 Thread Yang Wang
> The job sub directory will be cleaned up when the job finished/canceled/failed. Since we could submit multiple jobs into a Flink session, what i mean is when a job reached to the terminal state, the sub node(e.g. /flink/application_/running_job_registry/4d255397c7aeb5327adb567238c983c1) on th

Re: Flink alert after database lookUp

2020-09-09 Thread s_penakalap...@yahoo.com
Hi Timo, Thank you for the suggestions. I see now both Process function and CEP approach will not fit in. Now if I follow the third approach to stream the values from database() . Is it possible to stream data continuously? If I follow the bellow approach, both I see one time load only not cont

Re: Performance issue associated with managed RocksDB memory

2020-09-09 Thread Stephan Ewen
Thanks for driving, this, it is a great find and a nice proposal for a solution. I generally really like the idea of the block size sanity checker. I would also suggest to first go with logging a big fat WARNING rather than crashing the job. Crashing the job like this would be an unrecoverable fa

Re: Performance issue associated with managed RocksDB memory

2020-09-09 Thread Juha Mynttinen
Hey Yun, About the docs. I saw in the docs (https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html) this: "An advanced option (expert mode) to reduce the number of MemTable flushes in setups with many states, is to tune RocksDB’s ColumnFamily options (arena bl

Re: Flink alert after database lookUp

2020-09-09 Thread Timo Walther
Flink's built-in JDBC connector will read the data only once. JDBC does not provide means to continously monitor a database table. It depends on the size of your database, if you parameter table is small it might be sufficient to write a simple Flink connector that periodically reads the table

Re: How to get Latency Tracking results?

2020-09-09 Thread David Anderson
Pankaj, The Flink web UI doesn't do any visualizations of histogram metrics, so the only way to access the latency metrics is either through the REST api or a metrics reporter. The REST endpoint you tried is the correct place to find these metrics in all recent versions of Flink, but somewhere ba

Re: Performance issue associated with managed RocksDB memory

2020-09-09 Thread Stephan Ewen
Hey Juha! I agree that we cannot reasonably expect from the majority of users to understand block sizes, area sizes, etc to get their application running. So the default should be "inform when there is a problem and suggest to use more memory." Block/arena size tuning is for the absolute expertes,

Slow Performance inquiry

2020-09-09 Thread Heidi Hazem Mohamed
Dear, I am writing a Flink program(Recommender system) needed a matrix as a state which is the rating matrix, While the matrix is very sparse, I implemented a sparse binary matrix to save the memory and save only the ones, not all the matrix and use it as a data type and save it in a value Stat

Re: Slow Performance inquiry

2020-09-09 Thread Timo Walther
Hi Hazem, I guess your performance is mostly driven by the serialization overhead in this case. How do you declare your state type? Flink comes with different serializers. Not all of them are extracted automatically when using reflective extraction methods: - Note that `Serializable` decla

Re: Slow Performance inquiry

2020-09-09 Thread Heidi Hazem Mohamed
Hi Walther, Many thanks for your answer, I declared the state type as below ValueStateDescriptor descriptor = new ValueStateDescriptor( "Rating Matrix", TypeInformation.of(new TypeHint() { } )); Is there a better way? Regards, Heidy

Re: Flink 1.8.3 GC issues

2020-09-09 Thread Piotr Nowojski
Hi Josson, Thanks for getting back. What are the JVM settings and in particular GC settings that you are using (G1GC?)? It could also be an issue that in 1.4 you were just slightly below the threshold of GC issues, while in 1.8, something is using a bit more memory, causing the GC issues to appea

Re: Watermark generation issues with File sources in Flink 1.11.1

2020-09-09 Thread Arti Pande
Hi Aljoscha, By "checkpoints do not work" what I mean is ever since Flink 1.9.2 till 1.11.1 when using File source the source operator (guessing split enumerator or metadata reader) finishes immediately after starting (and assigning the splits to split readers) hence when first checkpoint is trigg

Re: How to get Latency Tracking results?

2020-09-09 Thread Pankaj Chand
Hi David, Thanks for replying! Sorry, I forgot to mention I am using Flink Version: 1.11.1, Commit ID: 7eb514a. Is it possible that the default SocketWindowWordCount job is too simple to generate Latency metrics? Or that the latency metrics disappear from the output JSON when the data ingestion is

Re: PyFlink - detect if environment is a StreamExecutionEnvironment

2020-09-09 Thread Manas Kale
Hi Xingbo and Till, Thank you for your help! On Wed, Sep 2, 2020 at 9:38 PM Xingbo Huang wrote: > Hi Manas, > > As Till said, you need to check whether the execution environment used is > LocalStreamEnvironment. You need to get the class object corresponding to > the corresponding java object th

Re: How to get Latency Tracking results?

2020-09-09 Thread David Anderson
Pankaj, I just checked, and the latency metrics for SocketWindowWordCount show up just fine for me with Flink 1.11.1. For me, the latency metrics existed even before I provided any data on the socket for the job to process. This makes sense, as the latency tracking markers will propagate through t

[DISCUSS] Deprecate and remove UnionList OperatorState

2020-09-09 Thread Aljoscha Krettek
Hi Devs, @Users: I'm cc'ing the user ML to see if there are any users that are relying on this feature. Please comment here if that is the case. I'd like to discuss the deprecation and eventual removal of UnionList Operator State, aka Operator State with Union Redistribution. If you don't kn

Re: Difficulties with Minio state storage

2020-09-09 Thread Arvid Heise
Hi Rex, you could also check the end to end tests that use minio in flink's repo. You definitely need to use an http endpoint. The setup [1] uses also another way to specify the s3.path.style.access (with dashes). I think we needed it especially for presto. It seems like the settings differ a bit

Re: [DISCUSS] Deprecate and remove UnionList OperatorState

2020-09-09 Thread Arvid Heise
+1 to getting rid of non-keyed state as is in general and for union state in particular. I had a hard time to wrap my head around the semantics of non-keyed state when designing the rescale of unaligned checkpoint. The only plausible use cases are legacy source and sinks. Both should also be rewor

Re: [DISCUSS] Deprecate and remove UnionList OperatorState

2020-09-09 Thread Seth Wiesman
Generally +1 The one use case I've seen of union state I've seen in production (outside of sources and sinks) is as a "poor mans" broadcast state. This was obviously before that feature was added which is now a few years ago so I don't know if those pipelines still exist. FWIW, if they do the stat

Re: Difficulties with Minio state storage

2020-09-09 Thread Rex Fenley
Thanks yall, Yangze, > I've tried to use MinIO as state backend and everything seems works well For clarity, I'm using RocksDB state backend with Minio as state storage. > s3.endpoint: http://localhost:9000 Also, I'm doing everything from docker-compose so localhost isn't going to work in my case.

Re: Watermark generation issues with File sources in Flink 1.11.1

2020-09-09 Thread David Anderson
Arti, The problem with watermarks and the File source operator will be fixed in 1.11.2 [1]. This bug was introduced in 1.10.0, and isn't related to the new WatermarkStrategy api. [1] https://issues.apache.org/jira/browse/FLINK-19109 David On Wed, Sep 9, 2020 at 2:52 PM Arti Pande wrote: > Hi

Re: Difficulties with Minio state storage

2020-09-09 Thread Rex Fenley
Good news! Eliminating bsEnv.setStateBackend( new RocksDBStateBackend( "s3://flink-jdbc-test_graph-minio_1/data/checkpoints:9000", true ) ) moving all configuration into FLINK_PROPERTIES and switching to http seemed to do the trick! Thanks for all the help! On Wed, Sep 9, 2020 at 9

Re: Slow Performance inquiry

2020-09-09 Thread David Anderson
Heidy, which state backend are you using? With RocksDB Flink will have to do ser/de on every access and update, but with the FsStateBackend, your sparse matrix will sit in memory, and only have to be serialized during checkpointing. David On Wed, Sep 9, 2020 at 2:41 PM Heidi Hazem Mohamed wrote:

Re: Use of slot sharing groups causing workflow to hang

2020-09-09 Thread Ken Krugler
Hi Til, > On Sep 3, 2020, at 12:31 AM, Till Rohrmann wrote: > > Hi Ken, > > I believe that we don't have a lot if not any explicit logging about the slot > sharing group in the code. You can, however, learn indirectly about it by > looking at the required number of AllocatedSlots in the SlotP

Re: Use of slot sharing groups causing workflow to hang

2020-09-09 Thread Yangze Guo
Hi, Ken >From the RM perspective, could you share the following logs: - "Request slot with profile {} for job {} with allocation id {}.". - "Requesting new slot [{}] and profile {} with allocation id {} from resource manager." This will help to figure out how many slots your job indeed requests. A

Re: Use of slot sharing groups causing workflow to hang

2020-09-09 Thread Xintong Song
Hi Ken, I've got a Flink MiniCluster with 12 slots. Even with only 6 pipelined > operators, each with a parallelism of 1, it still hangs while starting. > Could you double check that the minicluster has 12 slots when each or your operators has only 1 parallelism? I've looked into the codes. Curr