Hi,
from the TaskManager logs, I can not see anything suspicious.
Its a bit weird that the TaskManager logs just end, without any shutdown
messages. Usually the TMs log some shut down stuff when they are stopping.
Also, if they would be still running, I would expect some error messages
from akka about the connection status.
So the only thing I conclude is that one of the TMs was killed by the OS or
the JVM crashed. Did you check if that happened?

Do you have any service like puppet that is controlling processes?


On Thu, Jul 7, 2016 at 5:46 PM, Saliya Ekanayake <esal...@gmail.com> wrote:

> I see two logs (attached), but there's only 1 TaskManger process. Also,
> the Web console says it can find only 1 TM.
>
> However, I see this part in JM log, which shows there was a second TM at
> one point, but it was unregistered. Any thoughts?
>
> --------------------------
>
> - Registered TaskManager at j-002 (akka.tcp://
> flink@172.16.0.2:42888/user/taskmanager) as
> 1c65415701f19978c8a8cdc75c993717. Current number of registered hosts is 1.
> Current number of alive task slots is 12.
>
> 2016-07-07 11:32:40,363 WARN  akka.remote.ReliableDeliverySupervisor -
> Association with remote system [akka.tcp://flink@172.16.0.2:42888] has
> failed, address is now gated for [5000] ms. Reason is: [Disassociated].
>
> 2016-07-07 11:32:42,722 INFO
>  org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager
> at j-002 (akka.tcp://flink@172.16.0.2:37373/user/taskmanager) as
> 9c4ec66f5acbc19f7931fcae8345cd4e. Current number of registered hosts is 2.
> Current number of alive task slots is 24.
>
> 2016-07-07 11:33:15,316 WARN  Remoting - Tried to associate with
> unreachable remote address [akka.tcp://flink@172.16.0.2:42888]. Address
> is now gated for 5000 ms, all messages to this address will be delivered to
> dead letters. Reason: Connection refused: /172.16.0.2:42888
>
> 2016-07-07 11:33:15,320 INFO
>  org.apache.flink.runtime.jobmanager.JobManager - Task manager akka.tcp://
> flink@172.16.0.2:42888/user/taskmanager terminated.
> 2016-07-07 11:33:15,320 INFO
>  org.apache.flink.runtime.instance.InstanceManager - Unregistered task
> manager akka.tcp://flink@172.16.0.2:42888/user/taskmanager. Number of
> registered task managers 1. Number of available slots 12.
>
>
> On Thu, Jul 7, 2016 at 4:27 AM, Ufuk Celebi <u...@apache.org> wrote:
>
>> No that should suffice. Can you check whether there are any task
>> manager logs for the second TM on that machine
>> (taskmanager-X-j-011.log where X is the TM number)? If yes, the task
>> manager process does start up and there is another problem. If not,
>> the task managers seems not to start even.
>>
>> – Ufuk
>>
>> On Thu, Jul 7, 2016 at 7:34 AM, Saliya Ekanayake <esal...@gmail.com>
>> wrote:
>> > I tried to run more than one task manager per node by duplicating the
>> slave
>> > IPs. At startup it says for example,
>> >
>> > [INFO] 1 instance(s) of taskmanager are already running on j-011.
>> > Starting taskmanager daemon on host j-011.
>> >
>> > but I only see 1 task manager process running.
>> >
>> > Is there anything else I need to do?
>> >
>> > On Sun, Jul 3, 2016 at 11:28 AM, Ufuk Celebi <u...@apache.org> wrote:
>> >>
>> >> Yes, exactly.
>> >>
>> >> On Sat, Jul 2, 2016 at 6:28 PM, Saliya Ekanayake <esal...@gmail.com>
>> >> wrote:
>> >> > Thank you, yes, it can be done externally, if not supported within
>> >> > Flink.
>> >> >
>> >> > So the way to spawn multiple task managers would be to list the same
>> >> > slave
>> >> > machines N times as necessary in the slaves file?
>> >> >
>> >> > On Sat, Jul 2, 2016 at 11:22 AM, Ufuk Celebi <u...@apache.org> wrote:
>> >> >>
>> >> >> No, not inside of Flink. That sounds like something like the OS or
>> >> >> resource manager should handle.
>> >> >>
>> >> >> On Sat, Jul 2, 2016 at 5:12 PM, Saliya Ekanayake <esal...@gmail.com
>> >
>> >> >> wrote:
>> >> >> > That's great, so is there support to pin task managers to sockets
>> as
>> >> >> > well?
>> >> >> >
>> >> >> > On Sat, Jul 2, 2016 at 11:08 AM, Ufuk Celebi <u...@apache.org>
>> wrote:
>> >> >> >>
>> >> >> >> Regarding 2) if you don't manually configure something else, that
>> >> >> >> should happen always.
>> >> >> >>
>> >> >> >> Yes, you can run more than one task manager per node depending on
>> >> >> >> the
>> >> >> >> process isolation you want. Within a task manager, there are
>> >> >> >> multiple
>> >> >> >> threads for each slot. For example, if you have 2 task managers
>> with
>> >> >> >> 2
>> >> >> >> slots each and submit a job with parallelism 4, each task manager
>> >> >> >> will
>> >> >> >> execute 2 sub tasks in separate Threads.
>> >> >> >>
>> >> >> >>
>> >> >> >> On Sat, Jul 2, 2016 at 3:26 AM, Saliya Ekanayake <
>> esal...@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > Hi Ufuk,
>> >> >> >> >
>> >> >> >> > Looking at the document you sent it seems only 1 task manager
>> per
>> >> >> >> > node
>> >> >> >> > exist
>> >> >> >> > and within that you have multiple slots. Is it possible to run
>> >> >> >> > more
>> >> >> >> > than
>> >> >> >> > 1
>> >> >> >> > task manager per node? Also, within a task manager is the
>> >> >> >> > parallelism
>> >> >> >> > done
>> >> >> >> > through threads or processes?
>> >> >> >> >
>> >> >> >> > Thank you,
>> >> >> >> > Saliya
>> >> >> >> >
>> >> >> >> > On Thu, Jun 30, 2016 at 5:27 PM, Saliya Ekanayake
>> >> >> >> > <esal...@gmail.com>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> Thank you, I'll check these.
>> >> >> >> >>
>> >> >> >> >> In 2.) you said they are likely to exchange through memory. Is
>> >> >> >> >> there
>> >> >> >> >> a
>> >> >> >> >> case why they wouldn't?
>> >> >> >> >>
>> >> >> >> >> On Thu, Jun 30, 2016 at 5:03 AM, Ufuk Celebi <u...@apache.org>
>> >> >> >> >> wrote:
>> >> >> >> >>>
>> >> >> >> >>> On Thu, Jun 30, 2016 at 1:44 AM, Saliya Ekanayake
>> >> >> >> >>> <esal...@gmail.com>
>> >> >> >> >>> wrote:
>> >> >> >> >>> > 1. What parameters are available to control parallelism
>> within
>> >> >> >> >>> > a
>> >> >> >> >>> > node?
>> >> >> >> >>>
>> >> >> >> >>> Task Manager processing slots:
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#configuring-taskmanager-processing-slots
>> >> >> >> >>>
>> >> >> >> >>> > 2. Does Flink support shared memory-based messaging within
>> a
>> >> >> >> >>> > node
>> >> >> >> >>> > (without
>> >> >> >> >>> > doing TCP calls)?
>> >> >> >> >>>
>> >> >> >> >>> Yes, local exchanges happen via memory and not TCP, for
>> example
>> >> >> >> >>> if
>> >> >> >> >>> you
>> >> >> >> >>> have a map-reduce, map subtask 1 and reduce subtask 1 are
>> likely
>> >> >> >> >>> to
>> >> >> >> >>> exchange data locally.
>> >> >> >> >>>
>> >> >> >> >>> > 3. Is there support for Infiniband interconnect?
>> >> >> >> >>>
>> >> >> >> >>> No, not that I'm aware of.
>> >> >> >> >>>
>> >> >> >> >>> – Ufuk
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> Saliya Ekanayake
>> >> >> >> >> Ph.D. Candidate | Research Assistant
>> >> >> >> >> School of Informatics and Computing | Digital Science Center
>> >> >> >> >> Indiana University, Bloomington
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > Saliya Ekanayake
>> >> >> >> > Ph.D. Candidate | Research Assistant
>> >> >> >> > School of Informatics and Computing | Digital Science Center
>> >> >> >> > Indiana University, Bloomington
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Saliya Ekanayake
>> >> >> > Ph.D. Candidate | Research Assistant
>> >> >> > School of Informatics and Computing | Digital Science Center
>> >> >> > Indiana University, Bloomington
>> >> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Saliya Ekanayake
>> >> > Ph.D. Candidate | Research Assistant
>> >> > School of Informatics and Computing | Digital Science Center
>> >> > Indiana University, Bloomington
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > Saliya Ekanayake
>> > Ph.D. Candidate | Research Assistant
>> > School of Informatics and Computing | Digital Science Center
>> > Indiana University, Bloomington
>> >
>>
>
>
>
> --
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
>
>

Reply via email to