Re: Flink Exception - AmazonS3Exception and ExecutionGraph - Error in failover strategy

2018-12-04 Thread Flink Developer
When this happens, it appears that one of the workers fails but the rest of the workers continue to run. How would I be able to configure the app to be able to recover itself completely from the last successful checkpoint when this happens? ‐‐‐ Original Message ‐‐‐ On Monday, December

TTL state migration

2018-12-04 Thread Ning Shi
I have a job using TTL map state and RocksDB state backend. If I lengthen the TTL on the map state and resume the job from savepoint (or checkpoint assuming I don't change the state backend), will new values added to that map have the new TTL or will the old TTL in the savepoint override my

Re:kafka connector[specificStartOffset cannot auto commit offset to zookeeper]

2018-12-04 Thread 孙森
HI,all: I specify the exact offsets the consumer should start from for each partition.But the Kafka consumer connot periodically commit the offsets to Zookeeper. I have disabled the checkpoint only if the job is stopped.This is my code: val properties = new Properties()

How to distribute subtasks evenly across taskmanagers?

2018-12-04 Thread Sunny Yun
Why does Flink do resource management by only slots, not by TaskManagers and slots? If there are one Flink cluster to submit multiple jobs, how do I make JobManager to distribute subtasks evenly to all TaskManagers? Now, JobManager treats the slots globally, some jobs' operators are assigned only

Spring Boot and Apache Flink.

2018-12-04 Thread Durga Durga
Folks, We've been having a tough time building a spring boot app (Jar) to get our Flink jobs running in our Flink Cluster. The Spring Application Context is always getting set to null - when the flink job runs - did anyone had luck with this ?. Any help would be appreciated. Thanks a lot

Flink 1.7 job cluster (restore from checkpoint error)

2018-12-04 Thread Hao Sun
I am using 1.7 and job cluster on k8s. Here is how I start my job docker-entrypoint.sh job-cluster -j com.zendesk.fraud_prevention.examples.ConnectedStreams --allowNonRestoredState *Seems like --allowNonRestoredState is not honored* === Logs ===

Re: Query big mssql Data Source [Batch]

2018-12-04 Thread Flavio Pompermaier
You can pass a ParametersProvider to the jdbc input format in order to parallelize the fetch. Of course you don't have to kill the mysql server with too many request in parallel so you'll probably put a limit to the parallelism of the input format. On Tue, 4 Dec 2018, 17:31 miki haiat HI , > I

Re: Over-requesting Containers on YARN

2018-12-04 Thread Austin Cawley-Edwards
Perhaps related to this, one of my Tasks does not seem to be restoring correctly / check pointing. It hangs during the checkpoint process and then causes a timeout and then says "Checkpoint Coordinator is suspended." I have increased the "slot.idel.timeout" as was recommended here

Error deploying task manager after failure in Yarn

2018-12-04 Thread Anil
I'm using Flink 1.4.2 and running Flink on Yarn. Job runs with a parallelism of 2. Each task manager is allocated 1 core. When the container memory exceeds the allocated memory yarn kills the container as expected. {"debug_level":"INFO","debug_timestamp":"2018-12-04

Re: Over-requesting Containers on YARN

2018-12-04 Thread Austin Cawley-Edwards
We are seeing this OutOfMemoryError in the container logs. How can we increase the memory to take full advantage of the cluster? Or do we just have to more aggressively scale? Best, Austin java.lang.OutOfMemoryError: GC overhead limit exceeded at

Over-requesting Containers on YARN

2018-12-04 Thread Austin Cawley-Edwards
Hi all, We have a Flink 1.6 streaming application running on Amazon EMR, with a YARN session configured with 20GB for the Task Manager, 2GB for the Job Manager, and 4 slots (number of vCPUs), in detached mode. Each Core Node has 4 vCores, 32 GB mem, 32 GB disc, and each Task Node has 4 vCores, 8

Query big mssql Data Source [Batch]

2018-12-04 Thread miki haiat
HI , I want to query some sql table that contains ~80m rows. There is a few ways to do that and i wonder what is the best way to do that . 1. Using JDBCINPUTFORMAT -> convert to dataset and output it without doing any logic in the dataset, passing the full query in the

Using port ranges to connect with the Flink Client

2018-12-04 Thread Gyula Fóra
Hi! We have been running Flink on Yarn for quite some time and historically we specified port ranges so that the client can access the cluster: yarn.application-master.port: 100-200 Now we updated to flink 1.7 and try to migrate away from the legacy execution mode but we run into a problem that

Re: If you are an expert in flink sql, then I really need your help...

2018-12-04 Thread Timo Walther
Unfortunately, setting the parallelism per SQL operator is not supported right now. We are currently thinking about a way of having fine-grained control about properties of SQL operators but this is in an early design phase and might take a while Am 04.12.18 um 13:05 schrieb clay:

AW: number of files in checkpoint directory grows endlessly

2018-12-04 Thread Bernd.Winterstein
All calls to createColumnFamily were exchanged by createColumnFamilyWithTtl private Tuple2> tryRegisterKvStateInformation( StateDescriptor stateDesc, TypeSerializer namespaceSerializer, @Nullable StateSnapshotTransformer snapshotTransformer)

Re: If you are an expert in flink sql, then I really need your help...

2018-12-04 Thread clay4444
hi Timo: first very thank u, I have solve the ploblems, Regarding the problem of too large state, I set the global parallelism to 7 for the program, which solved my problem very well, checkpoint is very fast, but I would like to ask if there is a way to set parallelism for each

Re: long lived standalone job session cluster in kubernetes

2018-12-04 Thread Till Rohrmann
Hi Derek, what I would recommend to use is to trigger the cancel with savepoint command [1]. This will create a savepoint and terminate the job execution. Next you simply need to respawn the job cluster which you provide with the savepoint to resume from. [1]

AW: number of files in checkpoint directory grows endlessly

2018-12-04 Thread Bernd.Winterstein
Sorry for the late answer. I haven’t been in the office. The logs show no problems. The files that remain in the shared subfolder are almost all 1121 bytes. Except the files from the latest checkpoint (30 files for all operators) For each historic checkpoint six files remain (parallelism is 6)

Re: long lived standalone job session cluster in kubernetes

2018-12-04 Thread Andrey Zagrebin
Hi Derek, I think your automation steps look good. Recreating deployments should not take long and as you mention, this way you can avoid unpredictable old/new version collisions. Best, Andrey > On 4 Dec 2018, at 10:22, Dawid Wysakowicz wrote: > > Hi Derek, > > I am not an expert in

Re: High Job BackPressure

2018-12-04 Thread sayat
I forgot to mention that the job was recently moved from the cluster with SSD disk to SATA and SSD disk. On the old hardware, everything worked fine. Flink version is 1.6.2. There were FLASH optimized setting for RocksDB. I've changed to SPINNING_DISK_OPTIMIZED and it didn't have any effect. Old

Re: long lived standalone job session cluster in kubernetes

2018-12-04 Thread Dawid Wysakowicz
Hi Derek, I am not an expert in kubernetes, so I will cc Till, who should be able to help you more. As for the automation for similar process I would recommend having a look at dA platform[1] which is built on top of kubernetes. Best, Dawid [1] https://data-artisans.com/platform-overview On

Re: not able to join data coming from kafka

2018-12-04 Thread Dawid Wysakowicz
Hi Rakesh, Could you explain a little bit what is the actual problem? What do you expect as the ouput and what actually happens? It is hard to guess what is the problem you're facing. Best, Dawid On 03/12/2018 12:19, Rakesh Kumar wrote: > > Hello Team, > > > public class FlinkJoinDataStream {

Re: If you are an expert in flink sql, then I really need your help...

2018-12-04 Thread Timo Walther
Hi, yes this was a unintended behavior that got fixed in Flink 1.7. See https://issues.apache.org/jira/browse/FLINK-10474 Regards, Timo Am 04.12.18 um 05:21 schrieb clay: I have found out that checkpoint is not triggered. Regarding the in operation in flink sql, this sql will trigger

Re: CKAN inputFormat (batch)

2018-12-04 Thread Dawid Wysakowicz
Hi Flavio, Thank you for the example. It is definitely gonna be helpful for some people! Best, Dawid On 04/12/2018 09:05, Flavio Pompermaier wrote: > Yesterday it was working...alternatively you can look at  > https://github.com/ckan/ckan-instances/blob/gh-pages/config/instances.json > > The

Re: CKAN inputFormat (batch)

2018-12-04 Thread Flavio Pompermaier
Yesterday it was working...alternatively you can look at https://github.com/ckan/ckan-instances/blob/gh-pages/config/instances.json The purpose of this code is to share with the community an example of a useful input format that coul be interesting for people working with open data (indeed there