Document an example pattern that makes sources and sinks pluggable in the production code for testing

2019-11-11 Thread Hung
Hi guys, I found the testing part mentioned make sources and sinks pluggable in your production code and inject special test sources and test sinks in your tests. https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html#testing-flink-jobs I think it would be useful to

Document an example pattern that makes sources and sinks pluggable in the production code for testing

2019-11-11 Thread Hung
Hi guys, I found the testing part mentioned make sources and sinks pluggable in your production code and inject special test sources and test sinks in your tests. https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html#testing-flink-jobs I think it would be useful to

Re: Flink Standalone cluster - production settings

2019-02-21 Thread Hung
/ Each job has 3 asynch operators with Executors with thread counts of 20,20,100/ Flink handles parallelisms for you. If you want a higher parallelism of a operator, you can call setParallelism() for example, flatMap(new Mapper1()).setParallelism(20) flatMap(new Mapper2()).setParallelism(20)

Re: Apache Flink Examples

2018-04-27 Thread Hung
in my case I usually check the tests they write for each function I want to use. Take CountTrigger as an example, if I want to customize my own way of counting, I will have a look at the test the write

Re: End-to-end exactly once from kafka source to S3 sink

2018-01-31 Thread Hung
"Flink will only commit the kafka offsets when the data has been saved to S3" -> no, you can check the BucketingSink code, and it would mean BucketingSink depends on Kafka which is not reasonable. Flink stores checkpoint in disk of each worker, not Kafka. (KafkaStream, the other streaming API

Re: Advice or best practices on adding metadata to stream events

2018-01-31 Thread Hung
So there are three ways. 1. make your model as stream source 2. let master read the model once, distribute it via constructor, and update it periodically 3. let worker read the model and update it periodically(you mentioned) option 3 would be problematic if you scale a lot and use many

Re: [Window] Per key window

2018-01-31 Thread Hung
after you keyBy() each of your window has its group of events. or what you want is a global window? Best, Sendoh -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: How does BucketingSink generate a SUCCESS file when a directory is finished

2018-01-31 Thread Hung
it depends on how you partition your file. in my case I write file per hour, so I'm sure that file is ready after that hour period, in processing time. Here, read to be ready means this file contains all the data in that hour period. If the downstream runs in a batch way, you may want to ensure

Re: Scaling Flink

2018-01-24 Thread Hung
What about scaling up with #task slots left? You can obtain this information from Flink's endpoint. Cheers, Sendoh -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Is it possible to restart only the function that fails instead of entire job?

2016-07-01 Thread Chia-Hung Lin
After reading the document and configuring to test failure strategy, it seems to me Flink restarts the job once any failures (e.g. exception thrown, etc.) occur. https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html My question: Is it possible to configure