Re: Long blocking call in UserFunction triggers HA leader lost?

2020-11-12 Thread Maxim Parkachov
Hi Theo, We had a very similar problem with one of our spark streaming jobs. Best solution was to create a custom source having all external records in cache, periodically reading external data and comparing it to cache. All changed records were then broadcasted to task managers. We tried to

Re: Flink streaming job logging reserves space

2020-08-04 Thread Maxim Parkachov
> A good way is to write your logs to other separate files that could roll > via log4j. If you want to access them in the Flink webUI, > upgrade to the 1.11 version. Then you will find a "Log List" tab under > JobManager sidebar. > > > Best, > Yang > > Max

Re: Flink streaming job logging reserves space

2020-08-04 Thread Maxim Parkachov
to the > stdout/stderr, the two files will increase > over time. > > When you stop the Flink application, Yarn will clean up all the jars and > logs, so you find that the disk space get back. > > > Best, > Yang > > Maxim Parkachov 于2020年7月30日周四 下午10:00写道: > >>

Flink streaming job logging reserves space

2020-07-30 Thread Maxim Parkachov
Hi everyone, I have a strange issue with flink logging. I use pretty much standard log4 config, which is writing to standard output in order to see it in Flink GUI. Deployment is on YARN with job mode. I can see logs in UI, no problem. On the servers, where Flink YARN containers are running,

Re: Wait for cancellation event with CEP

2020-05-04 Thread Maxim Parkachov
Hi Till, thank you for very detailed answer, now it is absolutely clear. Regards, Maxim. On Thu, Apr 30, 2020 at 7:19 PM Till Rohrmann wrote: > Hi Maxim, > > I think your problem should be solvable with the CEP library: > > So what we are doing here is to define a pattern forward followed by

Wait for cancellation event with CEP

2020-04-30 Thread Maxim Parkachov
Hi everyone, I need to implement following functionality. There is a kafka topic where "forward" events are coming and in the same topic there are "cancel" events. For each "forward" event I need to wait 1 minute for possible "cancel" event. I can uniquely match both events. If "cancel" event

Re: New kafka producer on each checkpoint

2020-04-13 Thread Maxim Parkachov
f1cb7c47747b7/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L871 > > Best > Yun Tang > -- > *From:* Maxim Parkachov > *Sent:* Monday, April 6, 2020 23:16 > *To:* user@

New kafka producer on each checkpoint

2020-04-06 Thread Maxim Parkachov
Hi everyone, I'm trying to test exactly once functionality with my job under production load. The job is reading from kafka, using kafka timestamp as event time, aggregates every minute and outputs to other kafka topic. I use checkpoint interval 10 seconds. Everything seems to be working fine,

Re: [Flink 1.10] Classpath doesn't include custom files in lib/

2020-02-17 Thread Maxim Parkachov
getClassLoader().getResource("job.properties") > > > Best, > Yang > > Maxim Parkachov 于2020年2月17日周一 下午6:47写道: > >> Hi Yang, >> >> thanks, this explains why classpath behavior changed, but now I struggle >> to >> understand h

Re: [Flink 1.10] Classpath doesn't include custom files in lib/

2020-02-17 Thread Maxim Parkachov
Yang Wang wrote: > Hi Maxim Parkachov, > > The users files also have been shipped to JobManager and TaskManager. > However, it > is not directly added to the classpath. Instead, the parent directory is > added to the > classpath. This changes are to make resource classloadi

[Flink 1.10] Classpath doesn't include custom files in lib/

2020-02-14 Thread Maxim Parkachov
Hi everyone, I'm trying to run my job with flink 1.10 with YARN cluster per-job mode. In the previous versions all files in lib/ folder were automatically included in classpath. Now, with 1.10 I see only *.jar files are included in classpath. but not "other" files. Is this deliberate change or

Re: Flink 1.10 on MapR secure cluster with high availability

2020-02-07 Thread Maxim Parkachov
wrote: > No, since a) HA will never use classes from the user-jar and b) zookeeper > is relocated to a different package (to avoid conflicts) and hence any > replacement has to follow the same relocation convention. > > On 05/02/2020 15:38, Maxim Parkachov wrote: > > Hi Chesnay,

Re: Flink 1.10 on MapR secure cluster with high availability

2020-02-05 Thread Maxim Parkachov
> > On 05/02/2020 15:12, Maxim Parkachov wrote: > > Hi everyone, > > I have already written about issue with Flink 1.9 on secure MapR cluster > and high availability. The issue was resolved with custom compiled Flink > with vendor mapr repositories enabled. The history co

Flink 1.10 on MapR secure cluster with high availability

2020-02-05 Thread Maxim Parkachov
Hi everyone, I have already written about issue with Flink 1.9 on secure MapR cluster and high availability. The issue was resolved with custom compiled Flink with vendor mapr repositories enabled. The history could be found https://www.mail-archive.com/user@flink.apache.org/msg28235.html

Re: Initialization of broadcast state before processing main stream

2019-11-14 Thread Maxim Parkachov
Hi Vasily, unfortunately, this is known issue with Flink, you could read discussion under https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API . At the moment I have seen 3 solutions for this issue: 1. You buffer fact stream in local state before broadcast

Re: Flink 1.9, MapR secure cluster, high availability

2019-09-16 Thread Maxim Parkachov
Hi Stephan, sorry for the late answer, didn't have access to cluster. Here is log and stacktrace. Hope this helps, Maxim. - 2019-09-16 18:00:31,804 INFO

Re: Flink 1.9, MapR secure cluster, high availability

2019-08-30 Thread Maxim Parkachov
ZK? > > Best, > Stephan > > > On Tue, Aug 27, 2019 at 11:03 AM Maxim Parkachov > wrote: > >> Hi everyone, >> >> I'm testing release 1.9 on MapR secure cluster. I took flink binaries >> from download page and trying to start Yarn session cluster. All Map

Flink 1.9, MapR secure cluster, high availability

2019-08-27 Thread Maxim Parkachov
Hi everyone, I'm testing release 1.9 on MapR secure cluster. I took flink binaries from download page and trying to start Yarn session cluster. All MapR specific libraries and configs are added according to documentation. When I start yarn-session without high availability, it uses zookeeper

Re: Exactly-once semantics in Flink Kafka Producer

2019-08-02 Thread Maxim Parkachov
Hi Vasily, as far as I know, by default console-consumer reads uncommited. Try setting isolation.level to read_committed in console-consumer properties. Hope this helps, Maxim. On Fri, Aug 2, 2019 at 12:42 PM Vasily Melnik < vasily.mel...@glowbyteconsulting.com> wrote: > Hi, Eduardo. > Maybe

Re: Providing external files to flink classpath

2019-07-18 Thread Maxim Parkachov
Hi Vishwas, took me some time to find out as well. If you have your properties file under lib following will work: val kafkaPropertiesInputStream = getClass.getClassLoader.getResourceAsStream("lib/config/kafka.properties") Hope this helps, Maxim. On Wed, Jul 17, 2019 at 7:23 PM Vishwas

Re: yarn-session vs cluster per job for streaming jobs

2019-07-18 Thread Maxim Parkachov
link/flink-docs-release-1.8/ops/jobmanager_high_availability.html#yarn-cluster-high-availability > . > > Best, > Haibo > > At 2019-07-17 23:53:15, "Maxim Parkachov" wrote: > > Hi, > > I'm looking for advice on how to run flink streaming jobs on Yarn cluste

yarn-session vs cluster per job for streaming jobs

2019-07-17 Thread Maxim Parkachov
Hi, I'm looking for advice on how to run flink streaming jobs on Yarn cluster in production environment. I tried in testing environment both approaches with HA mode, namely yarn session + multiple jobs vs cluster per job, both seems to work for my cases, with slight preference of yarn session

Re: Automatic deployment of new version of streaming stateful job

2019-07-17 Thread Maxim Parkachov
: https://github.com/ing-bank/flink-deployer. The deployer will > allow you to deploy or upgrade your jobs. All you need to do is integrate > it into your CI/CD. > > Kind regards > > Marc > On 16 Jul 2019, 02:46 +0200, Maxim Parkachov , > wrote: > > Hi, > &g

Automatic deployment of new version of streaming stateful job

2019-07-15 Thread Maxim Parkachov
Hi, I'm trying to bring my first stateful streaming Flink job to production and have trouble understanding how to integrate it with CI/CD pipeline. I can cancel the job with savepoint, but in order to start new version of application I need to specify savepoint path manually ? So, basically my

Re: How autoscaling works on Kinesis Data Analytics for Java ?

2019-04-22 Thread Maxim Parkachov
snapshot and restarts streaming job, so no magic here, but nicely automated. Regards, Maxim. On Tue, Jan 29, 2019 at 5:23 AM Maxim Parkachov wrote: > Hi, > > I had impression, that in order to change parallelism, one need to stop > Flink streaming job and re-start with

How autoscaling works on Kinesis Data Analytics for Java ?

2019-01-28 Thread Maxim Parkachov
Hi, I had impression, that in order to change parallelism, one need to stop Flink streaming job and re-start with new settings. According to https://docs.aws.amazon.com/kinesisanalytics/latest/java/how-scaling.html auto-scaling works out of the box. Could someone with experience of running Flink

Re: Ship compiled code with broadcast stream ?

2018-10-09 Thread Maxim Parkachov
Hi, This is certainly possible. What you can do is use a > BroadcastProcessFunction where you receive the rule code on the broadcast > side. > Yes, this part works, no problem. > You probably cannot send newly compiled objects this way but what you can > do is either send a reference to some

Ship compiled code with broadcast stream ?

2018-10-09 Thread Maxim Parkachov
Hi everyone, I have a job with event stream and control stream delivering rules for event transformation. Rules are broadcasted and used in flatMat-like coProcessFunction. Rules are defined in custom JSON format. Amount of rules and complexity rises significantly with every new feature. What I

Multiple Async IO

2018-04-03 Thread Maxim Parkachov
Hi everyone, I'm writing streaming job which needs to query Cassandra for each event multiple times, around 20. I would like to use Async IO for that but not sure which option to choose: 1. Implement One AsyncFunction with 20 queries inside 2. Implement 20 AsyncFunctions, each with 1 query

Re: Forcing consuming one stream completely prior to another starting

2018-01-20 Thread Maxim Parkachov
Hi Ron, I’m joining two streams - one is a “decoration” stream that we have in a > compacted Kafka topic, produced using a view on a MySQL table AND using > Kafka Connect; the other is the “event data” we want to decorate, coming in > over time via Kafka. These streams are keyed the same way -

Fwd: Initialise side input state

2017-11-03 Thread Maxim Parkachov
im. > > Best, > Xingcan > > > On Fri, Nov 3, 2017 at 5:54 AM, Maxim Parkachov <lazy.gop...@gmail.com> > wrote: > >> Hi Flink users, >> >> I'm struggling with some basic concept and would appreciate some help. I >> have 2 Input streams,

Initialise side input state

2017-11-02 Thread Maxim Parkachov
Hi Flink users, I'm struggling with some basic concept and would appreciate some help. I have 2 Input streams, one is fast event stream and one is slow changing dimension. They have the same key and I use CoProcessFunction to store slow data in state and enrich fast data from this state.