Hi,

Have you checked task managers logs?

Piotrek

> On 8 Dec 2018, at 12:23, Alieh <sae...@informatik.uni-leipzig.de> wrote:
> 
> Hello Piotrek,
> 
> thank you for your answer. I installed a Flink on a local cluster and used 
> the GUI in order to monitor the task managers. It seems the program does not 
> start at all. The whole time just the job manager is struggling... For very 
> very toy examples, after a long time (during this time I see the job manager 
> logs as I mentioned before),  the job is started and can be executed in 2 
> seconds.  
> 
> Best,
> 
> Alieh
> 
> 
> On 12/07/2018 10:43 AM, Piotr Nowojski wrote:
>> Hi,
>> 
>> Please investigate logs/standard output/error from the task manager that has 
>> failed (the logs that you showed are from job manager). Probably there is 
>> some obvious error/exception explaining why has it failed. Most common 
>> reasons:
>> - out of memory
>> - long GC pause
>> - seg fault or other error from some native library
>> - task manager killed via for example SIGKILL
>> 
>> Piotrek
>> 
>>> On 6 Dec 2018, at 17:34, Alieh <sae...@informatik.uni-leipzig.de> 
>>> <mailto:sae...@informatik.uni-leipzig.de> wrote:
>>> 
>>> Hello all,
>>> 
>>> I have an algorithm x () which contains several joins and usage of 3 times 
>>> of gelly ConnectedComponents. The problem is that if I call x() inside a 
>>> script more than three times, I receive the messages listed below in the 
>>> log and the program is somehow stopped. It happens even if I run it with a 
>>> toy example of a graph with less that 10 vertices. Do you have any clue 
>>> what is the problem?
>>> 
>>> Cheers,
>>> 
>>> Alieh
>>> 
>>> 
>>> 129149 [flink-akka.actor.default-dispatcher-20] DEBUG 
>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - 
>>> Trigger heartbeat request.
>>> 129149 [flink-akka.actor.default-dispatcher-20] DEBUG 
>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - 
>>> Trigger heartbeat request.
>>> 129150 [flink-akka.actor.default-dispatcher-20] DEBUG 
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor  - Received heartbeat 
>>> request from e80ec35f3d0a04a68000ecbdc555f98b.
>>> 129150 [flink-akka.actor.default-dispatcher-22] DEBUG 
>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - 
>>> Received heartbeat from 78cdd7a4-0c00-4912-992f-a2990a5d46db.
>>> 129151 [flink-akka.actor.default-dispatcher-22] DEBUG 
>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - 
>>> Received new slot report from TaskManager 
>>> 78cdd7a4-0c00-4912-992f-a2990a5d46db.
>>> 129151 [flink-akka.actor.default-dispatcher-22] DEBUG 
>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received 
>>> slot report from instance 4c3e3654c11b09fbbf8e993a08a4c2da.
>>> 129200 [flink-akka.actor.default-dispatcher-15] DEBUG 
>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Release 
>>> TaskExecutor 4c3e3654c11b09fbbf8e993a08a4c2da because it exceeded the idle 
>>> timeout.
>>> 129200 [flink-akka.actor.default-dispatcher-15] DEBUG 
>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Worker 
>>> 78cdd7a4-0c00-4912-992f-a2990a5d46db could not be stopped.
>>> 
>> 
> 

Reply via email to