Re: Flink CLI does not return after submitting yarn job in detached mode

2018-08-15 Thread vino yang
Hi Marvin777, You are wrong. It uses the Flink on YARN single job mode and should use the "-yd" parameter. Hi Madhav, I seem to have found the problem, the source code of your log is here.[1] It is based on a judgment method "isUsingInteractiveMode". The source code for this method is here[2],

How to compare two window ?

2018-08-15 Thread 苗元君
Hi, Flink guys, U really to a quick release, it's fantastic ! I'v got a situation , window 1 is time driven, slice is 1min, trigger is 1 count window 2 is count driven, slice is 3 count, trigger is 1count 1. Then element is out of window1 and just right into window2. For example if there is o

Re: CoFlatMapFunction with more than two input streams

2018-08-15 Thread Averell
Thank you Xingcan. Regarding that Either, I still see the need to do TypeCasting/CaseClass matching. Could you please help give a look? val transformed = dog     .union(cat)     .connect(transformer)     .keyBy(r => r.name, r2 => r2.name)     .process(new Transfo

Re: How to submit flink job on yarn by java code

2018-08-15 Thread Rong Rong
I dont think your exception / code was attached. In general, this is largely depending on how your setup is. Are you trying to setup a long-running YARN session cluster or are you trying to directly use YARN cluster submit? [1]. We have an open-sourced project [2] with similar requirement submitti

Re: Flink CLI does not return after submitting yarn job in detached mode

2018-08-15 Thread Marvin777
Hi, Madhav, > ./flink-1.4.2/bin/flink run -m yarn-cluster *-yd* -yn 2 -yqu "default" > -ytm 2048 myjar.jar Modified to, ./flink-1.4.2/bin/flink run -m yarn-cluster -*d* -yn 2 -yqu "default" -ytm 2048 myjar.jar [image: image.png] madhav Kelkar 于2018年8月16日周四 上午5:01写道: > Hi there, > >

OneInputStreamOperatorTestHarness.snapshot doesn't include timers?

2018-08-15 Thread Ken Krugler
Hi all, It looks to me like the OperatorSubtaskState returned from OneInputStreamOperatorTestHarness.snapshot fails to include any timers that had been registered via registerProcessingTimeTimer but had not yet fired when the snapshot was saved. Is this a known limitation of OneInputStreamOper

Flink CLI does not return after submitting yarn job in detached mode

2018-08-15 Thread madhav Kelkar
Hi there, I am trying to run a single flink job on YARN in detached mode. as per the docs for flink 1.4.2, I am using -yd to do that. The problem I am having is the flink bash script doesn't terminate execution and return until I press control + c. In detached mode, I would expect the flink C

Re: docker, error NoResourceAvailableException..

2018-08-15 Thread Esteban Serrano
You can also instead of defining 2 services (taskmanager and taskmanager1), set the scale parameter on taskmanager to the number of desired slots. Something like this: taskmanager: image: "${FLINK_DOCKER_IMAGE:-flink:1.5.2}" scale: 2 expose: - "6121" - "6122" - "8081" On Wed, A

Re: docker, error NoResourceAvailableException..

2018-08-15 Thread shyla deshpande
Thanks Dominik, I will try that. On Wed, Aug 15, 2018 at 3:10 AM, Dominik Wosiński wrote: > Hey, > The problem is that your command does start Job Manager container, but it > does not start the Task Manager . That is why you have 0 slots. Currently, > the default *numberOfTaskSlots* is set to th

Re: CoFlatMapFunction with more than two input streams

2018-08-15 Thread Xingcan Cui
Hi Averell, With the CoProcessFunction, you could get access to the time-related services which may be useful when maintaining the elements in Stream_C and you could get rid of type casting with the Either class. Best, Xingcan > On Aug 15, 2018, at 3:27 PM, Averell wrote: > > Thank you Vino

Scala 2.12 Support

2018-08-15 Thread Aaron Levin
Hello! I'm wondering if there is anywhere I can see Flink's roadmap for Scala 2.12 support. The last email I can find on the list for this was back in January, and the FLINK-7811[0], the ticket asking for Scala 2.12 support, hasn't been updated in a few months. Recently Spark fixed the ClosureCle

Re: watermark does not progress

2018-08-15 Thread Hequn Cheng
Hi John, I guess the source data of local are different from the cluster. And as Fabian said, it is probably that some partitions don't carry data. As a choice, you can set job parallelism to 1 and check the result. Best, Hequn On Wed, Aug 15, 2018 at 5:22 PM, John O wrote: > I did some more t

Re: Stream collector serialization performance

2018-08-15 Thread Timo Walther
Hi Mingliang, first of all the POJO serializer is not very performant. Tuple or Row are better. If you want to improve the performance of a collect() between operators, you could also enable object reuse. You can read more about this here [1] (section "Issue 2: Object Reuse"), but make sure y

Re: 1.5.1

2018-08-15 Thread Gary Yao
Hi Juho, the main thread of the RPC endpoint should never be blocked. Blocking on that thread is considered an implementation error. Unfortunately, without logs it is difficult to tell what the exact problem is. If you are able to reproduce heartbeat timeouts on your test staging environment, can

ODP: docker, error NoResourceAvailableException..

2018-08-15 Thread Dominik Wosiński
Hey, The problem is that your command does start Job Manager container, but it does not start the Task Manager . That is why you have 0 slots. Currently, the default numberOfTaskSlots is set to the number of CPUs avaialbe on the machine. So, You generally can to do 2 things: 1) Start Job Mana

RE: watermark does not progress

2018-08-15 Thread John O
I did some more testing. Below is a pseudo version of by setup. kafkaconsumer-> assignTimestampsAndWatermarks(BoundedOutOfOrdernessTimestampExtractor)-> process(print1 ctx.timerService().currentWatermark()) -> keyBy(_.someProp) -> process(print2 ctx.timerService().currentWatermark()) -> I am man

Re: Limit on number of files to read for Dataset

2018-08-15 Thread Fabian Hueske
Hi Darshan, This looks like a file system configuration issue to me. Flink supports different file systems for S3 and there are also a few tuning knobs. Did you have a look at the docs for file system configuration [1]? Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.

Re: watermark does not progress

2018-08-15 Thread Fabian Hueske
Hi John, Watermarks cannot make progress if you have stream partitions that do not carry any data. What kind of source are you using? Best, Fabian 2018-08-15 4:25 GMT+02:00 vino yang : > Hi Johe, > > In local mode, it should also work. > When you debug, you can set a breakpoint in the getCurren

Re: 1.5.1

2018-08-15 Thread Juho Autio
Gary, I found another mail thread about similar issue: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Testing-on-Flink-1-5-tp19565p19647.html Specifically I found this: > we are observing Akka.ask.timeout error for few of our jobs (JM's logs[2]), we tried to increase this pa

Re: 1.5.1

2018-08-15 Thread Juho Autio
Vishal, from which version did you upgrade to 1.5.1? Maybe from 1.5.0 (release)? Knowing that might help narrowing down the source of this. On Wed, Aug 15, 2018 at 11:38 AM Juho Autio wrote: > Thanks Gary.. > > What could be blocking the RPC threads? Slow checkpointing? > > In production we're s

Re: 1.5.1

2018-08-15 Thread Juho Autio
Thanks Gary.. What could be blocking the RPC threads? Slow checkpointing? In production we're still using a self-built Flink package 1.5-SNAPSHOT, flink commit 8395508b0401353ed07375e22882e7581d46ac0e, and the jobs are stable. Now with 1.5.2 the same jobs are failing due to heartbeat timeouts ev

Re: CoFlatMapFunction with more than two input streams

2018-08-15 Thread vino yang
Hi Averell, What I mean is that if you store stream_c data in an RDBMS, you can access the RDBMS directly in the CoFlatMapFunction instead of using the Table API. This is somewhat similar to stream and dimension table joins. Of course, the premise of adopting this option is that the amount of data

Re: CoFlatMapFunction with more than two input streams

2018-08-15 Thread Averell
Thank you Vino & Xingcan. @Vino: could you help explain more details on using DBMS? Would that be with using TableAPI, or you meant directly reading DBMS data inside the ProcessFunction? @Xingcan: I don't know what are the benefits of using CoProcess over RichCoFlatMap in this case. Regarding usin

[ANNOUNCE] Weekly community update #33

2018-08-15 Thread Till Rohrmann
Dear community, this is the weekly community update thread #33. Please post any news and updates you want to share with the community to this thread. # Flink 1.6.0 has been released The community released Flink 1.6.0 [1]. Thanks to everyone who made this release possible. # Improving tutorials

Stream collector serialization performance

2018-08-15 Thread 祁明良
Hi all, I’m currently using the keyed process function, I see there’s serialization happening when I collect the object / update the object to rocksdb. For me the performance of serialization seems to be the bottleneck. By default, POJO serializer is used, and the timecost of collect / update to