Re: The issue of "Failed to shutdown socket with fd xx: Transport endpoint is not connected" on Mesos master

Nan Xiao Tue, 29 Dec 2015 23:05:15 -0800

Hi Avinash,

Thanks very much for your reply!


The root cause has been found: the k8s server has enabled the iptables
which blocks connection from
Mesos master; after disable it, it works!

Best Regards
Nan Xiao


On Wed, Dec 30, 2015 at 3:22 AM, Avinash Sridharan
<avin...@mesosphere.io> wrote:
> lsof command will show only actively opened file descriptors. So if you ran
> the command after seeing the error logs in the master, most probably the
> master had already closed this fd. Just throwing a few other things to look
> at, that might give some more insights.
>
> * Run the "netstat -na" and netstat -nt" commands on the master and the
> kubernetes master node to make sure that the master is listening to the
> right port, and the k8s scheduler is trying to connect to the right port.
> From the logs it does look like the master is receiving the registration
> request, so there shouldn't be a network configuration issue here.
> * Make sure there are no firewall rules getting turned on in your cluster
> since it looks like the k8s scheduler is not able to connect to the master
> (though it was able to register the first time).
>
> On Tue, Dec 29, 2015 at 1:37 AM, Nan Xiao <xiaonan830...@gmail.com> wrote:
>>
>> BTW, using "lsof" command finds there are only 16 file descriptors. I
>> don't know why Mesos
>> master try to close "fd 17".
>> Best Regards
>> Nan Xiao
>>
>>
>> On Tue, Dec 29, 2015 at 11:32 AM, Nan Xiao <xiaonan830...@gmail.com>
>> wrote:
>> > Hi Klaus,
>> >
>> > Firstly, thanks very much for your answer!
>> >
>> > The km processes are all live:
>> > root     129474 128024  2 22:26 pts/0    00:00:00 km apiserver
>> > --address=15.242.100.60 --etcd-servers=http://15.242.100.60:4001
>> > --service-cluster-ip-range=10.10.10.0/24 --port=8888
>> > --cloud-provider=mesos --cloud-config=mesos-cloud.conf --secure-port=0
>> > --v=1
>> > root     129509 128024  2 22:26 pts/0    00:00:00 km
>> > controller-manager --master=15.242.100.60:8888 --cloud-provider=mesos
>> > --cloud-config=./mesos-cloud.conf --v=1
>> > root     129538 128024  0 22:26 pts/0    00:00:00 km scheduler
>> > --address=15.242.100.60 --mesos-master=15.242.100.56:5050
>> > --etcd-servers=http://15.242.100.60:4001 --mesos-user=root
>> > --api-servers=15.242.100.60:8888 --cluster-dns=10.10.10.10
>> > --cluster-domain=cluster.local --v=2
>> >
>> > All the logs are also seem OK, except the logs from scheduler.log:
>> > ......
>> > I1228 22:26:37.883092  129538 messenger.go:381] Receiving message
>> > mesos.internal.InternalMasterChangeDetected from
>> > scheduler(1)@15.242.100.60:33077
>> > I1228 22:26:37.883225  129538 scheduler.go:374] New master
>> > master@15.242.100.56:5050 detected
>> > I1228 22:26:37.883268  129538 scheduler.go:435] No credentials were
>> > provided. Attempting to register scheduler without authentication.
>> > I1228 22:26:37.883356  129538 scheduler.go:928] Registering with
>> > master: master@15.242.100.56:5050
>> > I1228 22:26:37.883460  129538 messenger.go:187] Sending message
>> > mesos.internal.RegisterFrameworkMessage to master@15.242.100.56:5050
>> > I1228 22:26:37.883504  129538 scheduler.go:881] will retry
>> > registration in 1.209320575s if necessary
>> > I1228 22:26:37.883758  129538 http_transporter.go:193] Sending message
>> > to master@15.242.100.56:5050 via http
>> > I1228 22:26:37.883873  129538 http_transporter.go:587] libproc target
>> > URL
>> > http://15.242.100.56:5050/master/mesos.internal.RegisterFrameworkMessage
>> > I1228 22:26:39.093560  129538 scheduler.go:928] Registering with
>> > master: master@15.242.100.56:5050
>> > I1228 22:26:39.093659  129538 messenger.go:187] Sending message
>> > mesos.internal.RegisterFrameworkMessage to master@15.242.100.56:5050
>> > I1228 22:26:39.093702  129538 scheduler.go:881] will retry
>> > registration in 3.762036352s if necessary
>> > I1228 22:26:39.093765  129538 http_transporter.go:193] Sending message
>> > to master@15.242.100.56:5050 via http
>> > I1228 22:26:39.093847  129538 http_transporter.go:587] libproc target
>> > URL
>> > http://15.242.100.56:5050/master/mesos.internal.RegisterFrameworkMessage
>> > ......
>> >
>> > From the log, the Mesos master rejected the k8s's registeration, and
>> > k8s retry constantly.
>> >
>> > Have you met this issue before? Thanks very much in advance!
>> > Best Regards
>> > Nan Xiao
>> >
>> >
>> > On Mon, Dec 28, 2015 at 7:26 PM, Klaus Ma <klaus1982...@gmail.com>
>> > wrote:
>> >> It seems Kubernetes is down; would you help to check kubernetes's
>> >> status
>> >> (km)?
>> >>
>> >> ----
>> >> Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
>> >> Platform Symphony/DCOS Development & Support, STG, IBM GCG
>> >> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
>> >>
>> >> On Mon, Dec 28, 2015 at 6:35 PM, Nan Xiao <xiaonan830...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi all,
>> >>>
>> >>> Greetings from me!
>> >>>
>> >>> I am trying to follow this tutorial
>> >>>
>> >>>
>> >>> (https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos.md)
>> >>> to deploy "k8s on Mesos" on local machines: The k8s is the newest
>> >>> master branch, and Mesos is the 0.26 edition.
>> >>>
>> >>> After running Mesos master(IP:15.242.100.56), Mesos
>> >>> slave(IP:15.242.100.16),, and the k8s(IP:15.242.100.60), I can see the
>> >>> following logs from Mesos master:
>> >>>
>> >>> ......
>> >>> I1227 22:52:34.494478  8069 master.cpp:4269] Received update of slave
>> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 at slave(1)@15.242.100.16:5051
>> >>> (pqsfc016.ftc.rdlabs.hpecorp.net) with total oversubscribed resources
>> >>> I1227 22:52:34.494940  8065 hierarchical.cpp:400] Slave
>> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0
>> >>> (pqsfc016.ftc.rdlabs.hpecorp.net) updated with oversubscribed
>> >>> resources  (total: cpus(*):32; mem(*):127878; disk(*):4336;
>> >>> ports(*):[31000-32000], allocated: )
>> >>> I1227 22:53:06.740757 8053 http.cpp:334] HTTP GET for
>> >>> /master/state.json from 15.242.100.60:56219 with
>> >>> User-Agent='Go-http-client/1.1'
>> >>> I1227 22:53:07.736419 8065 http.cpp:334] HTTP GET for
>> >>> /master/state.json from 15.242.100.60:56241 with
>> >>> User-Agent='Go-http-client/1.1'
>> >>> I1227 22:53:07.767196  8070 http.cpp:334] HTTP GET for
>> >>> /master/state.json from 15.242.100.60:56252 with
>> >>> User-Agent='Go-http-client/1.1'
>> >>> I1227 22:53:08.808171  8053 http.cpp:334] HTTP GET for
>> >>> /master/state.json from 15.242.100.60:56272 with
>> >>> User-Agent='Go-http-client/1.1'
>> >>> I1227 22:53:08.815811 8060 master.cpp:2176] Received SUBSCRIBE call
>> >>> for framework 'Kubernetes' at scheduler(1)@15.242.100.60:59488
>> >>> I1227 22:53:08.816182 8060 master.cpp:2247] Subscribing framework
>> >>> Kubernetes with checkpointing enabled and capabilities [  ]
>> >>> I1227 22:53:08.817294  8052 hierarchical.cpp:195] Added framework
>> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000
>> >>> I1227 22:53:08.817464  8050 master.cpp:1122] Framework
>> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at
>> >>> scheduler(1)@15.242.100.60:59488 disconnected
>> >>> E1227 22:53:08.817497 8073 process.cpp:1911] Failed to shutdown
>> >>> socket with fd 17: Transport endpoint is not connected
>> >>> I1227 22:53:08.817533  8050 master.cpp:2472] Disconnecting framework
>> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at
>> >>> scheduler(1)@15.242.100.60:59488
>> >>> I1227 22:53:08.817595 8050 master.cpp:2496] Deactivating framework
>> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at
>> >>> scheduler(1)@15.242.100.60:59488
>> >>> I1227 22:53:08.817797 8050 master.cpp:1146] Giving framework
>> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at
>> >>> scheduler(1)@15.242.100.60:59488 7625.14222623576weeks to failover
>> >>> W1227 22:53:08.818389 8062 master.cpp:4840] Master returning
>> >>> resources offered to framework
>> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 because the framework has
>> >>> terminated or is inactive
>> >>> I1227 22:53:08.818397  8052 hierarchical.cpp:273] Deactivated
>> >>> framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000
>> >>> I1227 22:53:08.819046  8066 hierarchical.cpp:744] Recovered
>> >>> cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000]
>> >>> (total: cpus(*):32; mem(*):127878; disk(*):4336;
>> >>> ports(*):[31000-32000], allocated: ) on slave
>> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 from framework
>> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000
>> >>> ......
>> >>>
>> >>> I can't figure out why Mesos master complains "Failed to shutdown
>> >>> socket with fd 17: Transport endpoint is not connected".
>> >>> Could someone give some clues on this issue?
>> >>>
>> >>> Thanks very much in advance!
>> >>>
>> >>> Best Regards
>> >>> Nan Xiao
>> >>
>> >>
>
>
>
>
> --
> Avinash Sridharan, Mesosphere
> +1 (323) 702 5245

Re: The issue of "Failed to shutdown socket with fd xx: Transport endpoint is not connected" on Mesos master

Reply via email to