[SURVEY] How do you use high-availability services in Flink?

2019-08-21 Thread Zili Chen
Hi guys, We want to have an accurate idea of how users actually use high-availability services in Flink, especially how you customize high-availability services by HighAvailabilityServicesFactory. Basically there are standalone impl., zookeeper impl., embedded impl. used in MiniCluster, YARN impl

Re: [SURVEY] How do you use high-availability services in Flink?

2019-08-21 Thread Zili Chen
In addition, FLINK-13750[1] also likely introduce breaking change on high-availability services. So it is highly encouraged you who might be affected by the change share your cases :-) Best, tison. [1] https://issues.apache.org/jira/browse/FLINK-13750 Zili Chen 于2019年8月21日周三 下午3:32写道: > Hi gu

KryoSerializer is used for List type instead of ListSerializer

2019-08-21 Thread spoganshev
The following code: val MAILBOX_SET_TYPE_INFO = object: TypeHint>() {}.typeInfo val env = StreamExecutionEnvironment.getExecutionEnvironment() println(MAILBOX_SET_TYPE_INFO.createSerializer(env.config)) prints: org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer@2c39e53 While there

Re: Flink SQL: How to tag a column as 'rowtime' from a Avro based DataStream?

2019-08-21 Thread Niels Basjes
Hi, It has taken me quite a bit of time to figure this out. This is the solution I have now (works on my machine). Please tell me where I can improve this. Turns out that the schema you provide for registerDataStream only needs the 'top level' fields of the Avro datastructure. With only the top

Can I use watermarkers to have a global trigger of different ProcessFunction's?

2019-08-21 Thread Felipe Gutierrez
Hi, I am a little confused about watermarkers in Flink. My application is using EventTime. My sources are calling ctx.collectWithTimestamp and ctx.emitWatermark. Then I have a CoProcessFunction which merge the two streams. I have a state on this function and I want to clean this state every time

Re: What is the recommended way to run flink with high availability on AWS?

2019-08-21 Thread sri hari kali charan Tummala
Ok, no problem. On Wed, Aug 21, 2019 at 12:22 AM Pei HE wrote: > Thanks Kali for the information. However, it doesn't work for me, because > I need features in Flink 1.7.x or later and use manged Amazon MSK. > -- > Pei > > > > On Tue, Aug 20, 2019 at 7:17 PM sri hari kali charan Tummala < > kali

Re: Questions for platform to choose

2019-08-21 Thread Oytun Tez
Flink 💅💂 --- Oytun Tez *M O T A W O R D* The World's Fastest Human Translation Platform. oy...@motaword.com — www.motaword.com On Wed, Aug 21, 2019 at 2:42 AM Eliza wrote: > Hello, > > We have all of spark, flink, storm, kafka installed. > For realtime streaming calculation, which one is the

RE: Recovery from job manager crash using check points

2019-08-21 Thread min.tan
Thanks for the helpful reply. One more question, does this zookeeper or HA requirement apply for a savepoint? Can I bounce a single jobmanager cluster and rerun my flink job from its previous states with a save point directory? e.g. ./bin/flink run myJob.jar -s savepointDirectory Regards, Min

Re: Questions for platform to choose

2019-08-21 Thread Robert Metzger
Hey Eliza, This decision depends on many factors, such as the experience of your team, your use case, your deployment model, your workload, expected growth etc. Posting the same question to the user mailing list of all these systems won't magically answer you the question, because there is no obje

Re: Recovery from job manager crash using check points

2019-08-21 Thread Zili Chen
Hi Min, For your question, the answer is no. In standalone case Flink uses an in memory checkpoint store which is able to restore your savepoint configured in command-line and recover states from it. Besides, stop with savepoint and resume the job from savepoint is the standard path to migrate j

Multiple trigger events on keyed window

2019-08-21 Thread Eric Isling
Dear list-members, I have a question regarding window-firing and element accumulation for a slidindingwindow on a DataStream (Flink 1.8.1-2.12). My DataStream is derived from a custom SourceFunction, which emits stirng-sequences of WINDOW size, in a deterministic sequence. The aim is to crete sli

Re: Multiple trigger events on keyed window

2019-08-21 Thread Eric Isling
I should add that the behaviour persists, even when I force parallelism to 1. On Wed, Aug 21, 2019 at 5:19 PM Eric Isling wrote: > Dear list-members, > > I have a question regarding window-firing and element accumulation for a > slidindingwindow on a DataStream (Flink 1.8.1-2.12). > > My DataStr

Apache Flink - How to get heap dump when a job is failing in EMR

2019-08-21 Thread M Singh
Hi: Is there any configuration to get heap dump when job fails in an EMR ?  Thanks

Flink logback

2019-08-21 Thread Vishwas Siravara
Hi all, I modified the logback.xml provided by flink distribution, so now the logback.xml file looks like this : *${log.file} false %d{-MM-dd HH:mm:ss.SSS} [%thread]

Re: Can I use watermarkers to have a global trigger of different ProcessFunction's?

2019-08-21 Thread David Anderson
What Watermarks do is to advance the event time clock. You can consider a Watermark(t) as an assertion about the completeness of the stream -- it marks a point in the stream and says that at that point, the stream is (probably) now complete up to time t. The autoWatermarkInterval determines how of

Re: Window Function that releases when downstream work is completed

2019-08-21 Thread David Anderson
I'm not sure I fully understand the scenario you envision. Are you saying you want to have some sort of window that batches (and deduplicates) up until a downstream map has finished processing the previous deduplicated batch, and then the window should emit the new batch? If that's what you want,

Help with combining multiple streams simultaneously.

2019-08-21 Thread Siddhartha Khaitan
Hello, Currently I have 2 streams and I enrich stream 1 with the second streams. To further enrich stream 1 we are planning to add 2 more streams so a total of 4 streams. 1. Stream 1 read from Kafka 2. Stream 2 read from Kafka 3. Stream 3 will be read from Kafka - new 4. Stream 4 will

Externalized checkpoints

2019-08-21 Thread Vishwas Siravara
Hi peeps, I am externalizing checkpoints in S3 for my flink job and I retain them on cancellation. However when I look into my S3 bucket where the checkpoints are stored there is only 1 checkpoint at any point in time . Is this the default behavior of flink where older checkpoints are deleted when

Re: Externalized checkpoints

2019-08-21 Thread Vishwas Siravara
I am also using exactly once checkpointing mode, I have a kafka source and sink so both support transactions which should allow for exactly once processing. Is this the reason why there is only one checkpoint retained ? Thanks, Vishwas On Wed, Aug 21, 2019 at 5:26 PM Vishwas Siravara wrote: > H

TaskManager not connecting to ResourceManager in HA mode

2019-08-21 Thread Aleksandar Mastilovic
Hi all, I’m experimenting with using my own implementation of HA services instead of ZooKeeper that would persist JobManager information on a Kubernetes volume instead of in ZooKeeper. I’ve set the high-availability option in flink-conf.yaml to the FQN of my factory class, and started the dock

Re: Questions for platform to choose

2019-08-21 Thread Eliza
Hi on 2019/8/21 22:46, Robert Metzger wrote: I would recommend you to do some research yourself (there is plenty of material online), and then try out the most promising systems yourself. That's right. thank you. regards.

Re: Apache Flink - How to get heap dump when a job is failing in EMR

2019-08-21 Thread Yang Wang
I think it depends on the root cause of your job failure. Maybe the following jvm options could help you to get the heap dump. 1. -XX:+HeapDumpOnOutOfMemoryError 2. -XX:+HeapDumpBeforeFullGC 3. -XX:+HeapDumpAfterFullGC 4. -XX:HeapDumpPath=/tmp/heap.dump.1 Use *env.java.opts* to set java opts for

Re: TaskManager not connecting to ResourceManager in HA mode

2019-08-21 Thread Zhu Zhu
Hi Aleksandar, The resource manager address is retrieved from the HA services. Would you check whether your customized HA services is returning the right LeaderRetrievalService and whether the LeaderRetrievalService is really retrieving the right leader's address? Or is it possible that the stored

Re: Externalized checkpoints

2019-08-21 Thread Zhu Zhu
Hi Vishwas, You can configure "state.checkpoints.num-retained" to specify the max checkpoints to retain. By default it is 1. Thanks, Zhu Zhu Vishwas Siravara 于2019年8月22日周四 上午6:48写道: > I am also using exactly once checkpointing mode, I have a kafka source and > sink so both support transactions

Re: TaskManager not connecting to ResourceManager in HA mode

2019-08-21 Thread Zili Chen
Hi Aleksandar, base on your log: taskmanager_1 | 2019-08-22 00:05:03,713 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor- Connecting to ResourceManager akka.tcp://flink@jobmanager:6123/user/jobmanager() . taskmanager_1 | 2019-08-22 00:05:04

Re: TaskManager not connecting to ResourceManager in HA mode

2019-08-21 Thread Zili Chen
Besides, would you like to participant our survey thread[1] on user list about "How do you use high-availability services in Flink?" It would help Flink improve its high-availability serving. Best, tison. [1] https://lists.apache.org/x/thread.html/c0cc07197e6ba30b45d7709cc9e17d8497e5e3f33de504d5

How do we debug on a local task manager

2019-08-21 Thread Raj, Smriti
Hello, I added this to the command line argument -Denv.java.opts.taskmanager="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=6005" and also tried adding the below to the flink-config.yaml env.java.opts.taskmanager: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=600

Re: How do we debug on a local task manager

2019-08-21 Thread Caizhi Weng
Hi Raj, Have you restarted the cluster? You need to restart the cluster to apply changes in flink-config.yaml. You can also set suspend=y in the debug argument so that task managers will pause and wait for the connection of Intellij before going on. Raj, Smriti 于2019年8月22日周四 上午11:06写道: > Hello

Maximal watermark when two streams are connected

2019-08-21 Thread Sung Gon Yi
Hello, Originally, watermark of connected stream is set by minimum of watermarks two streams when two streams are connected. I wrote a code to connect two streams but one of streams does not have any message by a condition. In this situation, watermark is never increased and processing is stuck.

Re: Maximal watermark when two streams are connected

2019-08-21 Thread Jark Wu
Hi Sung, Watermark will be advanced only when records come in if you are using ".assignTimestampsAndWatermarks()". One way to solve this problem is you should call ".assignTimestampsAndWatermarks()" before the condition to make sure there are messages. Best, Jark On Thu, 22 Aug 2019 at 13:52, Su