Re: After job cancel, leftover ZK state prevents job manager startup

2018-12-11 Thread Stefan Richter
Hi, Thanks for reporting the problem, I think the exception trace looks indeed very similar to traces in the discussion for FLINK-10184. I will pull in Till who worked on the fix to hear his opinion. Maybe the current fix only made the problem less likely to appear but is not complete, yet? Be

Re: runtime.resourcemanager

2018-12-11 Thread Piotr Nowojski
Hey, Is that whole Task Manager log? Have you checked memory issues both on Task Managers and the Job Manager? Like out of memory/long GC pauses as I suggested in the first email? After you rule memory issues, you could capture couple of thread dumps (`kill -3 JVM_PID` or `jstack JVM_PID`) an

Re: After job cancel, leftover ZK state prevents job manager startup

2018-12-11 Thread Till Rohrmann
Hi Micah, the problem looks indeed similar to FLINK-10255. Could you tell me your exact cluster setup (HA with stand by JobManagers?). Moreover, the logs of all JobManagers on DEBUG level would be helpful for further debugging. Cheers, Till On Tue, Dec 11, 2018 at 10:09 AM Stefan Richter wrote:

sql program throw exception when new kafka with csv format

2018-12-11 Thread Marvin777
Register kafka message source with csv format, the error message is as follows: Exception in thread "main" org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for 'org.apache.flink.table.factories.DeserializationSchemaFactory' in the classpath. Re

Re: Very slow checkpoints occasionally occur

2018-12-11 Thread Stefan Richter
Hi, Looking at the numbers, it seems to me that checkpoint execution (the times of the sync and async part) are always reasonable fast once they are executed on the task, but there are changes in the alignment time and the time from triggering a checkpoint to executing a checkpoint. As you are

Re: Error while reading from hadoop sequence file

2018-12-11 Thread Stefan Richter
Hi, I am a bit confused by the explanation, the exception that you mentioned, is it happening in the first code snippet ( with the TypeInformation.of(…)) or the second one? From looking into the code, I would expect the exception can only happen in the second snippet (without TypeInformation) b

CodeCache is full - Issues with job deployments

2018-12-11 Thread PedroMrChaves
Hello, Every time I deploy a flink job the code cache increases, which is expected. However, when I stop and start the job or it restarts the code cache continuous to increase. Screenshot_2018-12-11_at_11.png

Re: CodeCache is full - Issues with job deployments

2018-12-11 Thread Stefan Richter
Hi, in general, Flink uses user-code class loader for job specific code and the lifecycle of the class loader should end with the job. This usually means that job related code could be removed after the job is finished. However, objects of a class that was loaded by the user-code class loader s

Re: sql program throw exception when new kafka with csv format

2018-12-11 Thread Hequn Cheng
Hi Marvin, I had taken a look at the Flink code. It seems we can't use CSV format for Kafka. You can use JSON instead. As the exception shows, Flink can't find a suitable DeserializationSchemaFactory. Currently, only JSON and Avro support DeserializationSchemaFactory. Best, Hequn On Tue, Dec 11,

Re: sql program throw exception when new kafka with csv format

2018-12-11 Thread Timo Walther
Hi Marvin, the CSV format is not supported for Kafka so far. Only formats that have the tag `DeserializationSchema` in the docs are supported. Right now you have to implement you own DeserializationSchemaFactory or use JSON or Avro. You can follow [1] to get informed once the CSV format is

Re: CodeCache is full - Issues with job deployments

2018-12-11 Thread PedroMrChaves
Hello Stefan, Thank you for the reply. I've taken a heap dump from a development cluster using jmap and analysed it. To do the tests we restarted the cluster and then left a job running for a few minutes. After that, we restarted the job a couple of times and stopped it. After leaving the cluster

How many times Flink initialize an operator?

2018-12-11 Thread Hao Sun
I am using Flink 1.7 on K8S. This might does not matter :D. I think Flink only initialize the MapFunction once per taskManager right? Because Flink will serialize the execution graph and distribute it to taskManagers. Or it creates a new MapFunction for every element? stream.map(new MapFunction[I

Re: How many times Flink initialize an operator?

2018-12-11 Thread Hao Sun
Ok, thanks for the clarification. Hao Sun Team Lead 1019 Market St. 7F San Francisco, CA 94103 On Tue, Dec 11, 2018 at 2:38 PM Ken Krugler wrote: > It’s based the parallelism of that operator, not the number of > TaskManagers. > > E.g. you can have an operator with a parallelism of one, and you

Re: How many times Flink initialize an operator?

2018-12-11 Thread Ken Krugler
It’s based the parallelism of that operator, not the number of TaskManagers. E.g. you can have an operator with a parallelism of one, and your cluster has 10 TaskManagers, and you’ll only get a single instance of the operator. — Ken > On Dec 11, 2018, at 2:01 PM, Hao Sun wrote: > > I am usin

Re: Cannot configure akka.ask.timeout

2018-12-11 Thread Alex Vinnik
Hi there, Run into the same problem running a batch job with Flink 1.6.1/1.6.2 . akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#202546747]] after [1 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage". akka

Singleton in a taskmanager

2018-12-11 Thread burgesschen
Hi Guys, I am running into a problem. I have 2 jobs running on the same taskmanager. Each Job creates a singleton of the same class, say MySingleton class. Are they actually sharing the same singleton? Hope my question is clear. Best, Burgess Chen -- Sent from: http://apache-flink-user-mail

Re: Cannot configure akka.ask.timeout

2018-12-11 Thread qi luo
Hi Alex and Lukas, This error is controlled by another RPC timeout (which is hard coded and not affected by “akka.ask.timeout”). Could you open an JIRA issue so I can propose a fix on that? Cheers, Qi > On Dec 12, 2018, at 7:07 AM, Alex Vinnik wrote: > > Hi there, > > Run into the same prob

Re: Singleton in a taskmanager

2018-12-11 Thread bupt_ljy
Hi Chen, They will not be sharing the same singleton. Firstly, the class is referenced by its classloader. And the classloader is bound to task. Therefore, different job’s slots have different classloaders, which means the different task’s class's references are different. Please correct me

Re: Error while reading from hadoop sequence file

2018-12-11 Thread Akshay Mendole
Hi Stefen, You are correct. I logged the error messages incorrectly in my previous mail. When I execute this code snippet DataSource> input = env.createInput(HadoopInputs.readSequenceFile(Text.class, Text.class, ravenDataDir)); I got this error The type returned by the input for