Hi Avinash, Thanks very much for your reply!
The root cause has been found: the k8s server has enabled the iptables which blocks connection from Mesos master; after disable it, it works! Best Regards Nan Xiao On Wed, Dec 30, 2015 at 3:22 AM, Avinash Sridharan <avin...@mesosphere.io> wrote: > lsof command will show only actively opened file descriptors. So if you ran > the command after seeing the error logs in the master, most probably the > master had already closed this fd. Just throwing a few other things to look > at, that might give some more insights. > > * Run the "netstat -na" and netstat -nt" commands on the master and the > kubernetes master node to make sure that the master is listening to the > right port, and the k8s scheduler is trying to connect to the right port. > From the logs it does look like the master is receiving the registration > request, so there shouldn't be a network configuration issue here. > * Make sure there are no firewall rules getting turned on in your cluster > since it looks like the k8s scheduler is not able to connect to the master > (though it was able to register the first time). > > On Tue, Dec 29, 2015 at 1:37 AM, Nan Xiao <xiaonan830...@gmail.com> wrote: >> >> BTW, using "lsof" command finds there are only 16 file descriptors. I >> don't know why Mesos >> master try to close "fd 17". >> Best Regards >> Nan Xiao >> >> >> On Tue, Dec 29, 2015 at 11:32 AM, Nan Xiao <xiaonan830...@gmail.com> >> wrote: >> > Hi Klaus, >> > >> > Firstly, thanks very much for your answer! >> > >> > The km processes are all live: >> > root 129474 128024 2 22:26 pts/0 00:00:00 km apiserver >> > --address=15.242.100.60 --etcd-servers=http://15.242.100.60:4001 >> > --service-cluster-ip-range=10.10.10.0/24 --port=8888 >> > --cloud-provider=mesos --cloud-config=mesos-cloud.conf --secure-port=0 >> > --v=1 >> > root 129509 128024 2 22:26 pts/0 00:00:00 km >> > controller-manager --master=15.242.100.60:8888 --cloud-provider=mesos >> > --cloud-config=./mesos-cloud.conf --v=1 >> > root 129538 128024 0 22:26 pts/0 00:00:00 km scheduler >> > --address=15.242.100.60 --mesos-master=15.242.100.56:5050 >> > --etcd-servers=http://15.242.100.60:4001 --mesos-user=root >> > --api-servers=15.242.100.60:8888 --cluster-dns=10.10.10.10 >> > --cluster-domain=cluster.local --v=2 >> > >> > All the logs are also seem OK, except the logs from scheduler.log: >> > ...... >> > I1228 22:26:37.883092 129538 messenger.go:381] Receiving message >> > mesos.internal.InternalMasterChangeDetected from >> > scheduler(1)@15.242.100.60:33077 >> > I1228 22:26:37.883225 129538 scheduler.go:374] New master >> > master@15.242.100.56:5050 detected >> > I1228 22:26:37.883268 129538 scheduler.go:435] No credentials were >> > provided. Attempting to register scheduler without authentication. >> > I1228 22:26:37.883356 129538 scheduler.go:928] Registering with >> > master: master@15.242.100.56:5050 >> > I1228 22:26:37.883460 129538 messenger.go:187] Sending message >> > mesos.internal.RegisterFrameworkMessage to master@15.242.100.56:5050 >> > I1228 22:26:37.883504 129538 scheduler.go:881] will retry >> > registration in 1.209320575s if necessary >> > I1228 22:26:37.883758 129538 http_transporter.go:193] Sending message >> > to master@15.242.100.56:5050 via http >> > I1228 22:26:37.883873 129538 http_transporter.go:587] libproc target >> > URL >> > http://15.242.100.56:5050/master/mesos.internal.RegisterFrameworkMessage >> > I1228 22:26:39.093560 129538 scheduler.go:928] Registering with >> > master: master@15.242.100.56:5050 >> > I1228 22:26:39.093659 129538 messenger.go:187] Sending message >> > mesos.internal.RegisterFrameworkMessage to master@15.242.100.56:5050 >> > I1228 22:26:39.093702 129538 scheduler.go:881] will retry >> > registration in 3.762036352s if necessary >> > I1228 22:26:39.093765 129538 http_transporter.go:193] Sending message >> > to master@15.242.100.56:5050 via http >> > I1228 22:26:39.093847 129538 http_transporter.go:587] libproc target >> > URL >> > http://15.242.100.56:5050/master/mesos.internal.RegisterFrameworkMessage >> > ...... >> > >> > From the log, the Mesos master rejected the k8s's registeration, and >> > k8s retry constantly. >> > >> > Have you met this issue before? Thanks very much in advance! >> > Best Regards >> > Nan Xiao >> > >> > >> > On Mon, Dec 28, 2015 at 7:26 PM, Klaus Ma <klaus1982...@gmail.com> >> > wrote: >> >> It seems Kubernetes is down; would you help to check kubernetes's >> >> status >> >> (km)? >> >> >> >> ---- >> >> Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer >> >> Platform Symphony/DCOS Development & Support, STG, IBM GCG >> >> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me >> >> >> >> On Mon, Dec 28, 2015 at 6:35 PM, Nan Xiao <xiaonan830...@gmail.com> >> >> wrote: >> >>> >> >>> Hi all, >> >>> >> >>> Greetings from me! >> >>> >> >>> I am trying to follow this tutorial >> >>> >> >>> >> >>> (https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos.md) >> >>> to deploy "k8s on Mesos" on local machines: The k8s is the newest >> >>> master branch, and Mesos is the 0.26 edition. >> >>> >> >>> After running Mesos master(IP:15.242.100.56), Mesos >> >>> slave(IP:15.242.100.16),, and the k8s(IP:15.242.100.60), I can see the >> >>> following logs from Mesos master: >> >>> >> >>> ...... >> >>> I1227 22:52:34.494478 8069 master.cpp:4269] Received update of slave >> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 at slave(1)@15.242.100.16:5051 >> >>> (pqsfc016.ftc.rdlabs.hpecorp.net) with total oversubscribed resources >> >>> I1227 22:52:34.494940 8065 hierarchical.cpp:400] Slave >> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 >> >>> (pqsfc016.ftc.rdlabs.hpecorp.net) updated with oversubscribed >> >>> resources (total: cpus(*):32; mem(*):127878; disk(*):4336; >> >>> ports(*):[31000-32000], allocated: ) >> >>> I1227 22:53:06.740757 8053 http.cpp:334] HTTP GET for >> >>> /master/state.json from 15.242.100.60:56219 with >> >>> User-Agent='Go-http-client/1.1' >> >>> I1227 22:53:07.736419 8065 http.cpp:334] HTTP GET for >> >>> /master/state.json from 15.242.100.60:56241 with >> >>> User-Agent='Go-http-client/1.1' >> >>> I1227 22:53:07.767196 8070 http.cpp:334] HTTP GET for >> >>> /master/state.json from 15.242.100.60:56252 with >> >>> User-Agent='Go-http-client/1.1' >> >>> I1227 22:53:08.808171 8053 http.cpp:334] HTTP GET for >> >>> /master/state.json from 15.242.100.60:56272 with >> >>> User-Agent='Go-http-client/1.1' >> >>> I1227 22:53:08.815811 8060 master.cpp:2176] Received SUBSCRIBE call >> >>> for framework 'Kubernetes' at scheduler(1)@15.242.100.60:59488 >> >>> I1227 22:53:08.816182 8060 master.cpp:2247] Subscribing framework >> >>> Kubernetes with checkpointing enabled and capabilities [ ] >> >>> I1227 22:53:08.817294 8052 hierarchical.cpp:195] Added framework >> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 >> >>> I1227 22:53:08.817464 8050 master.cpp:1122] Framework >> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at >> >>> scheduler(1)@15.242.100.60:59488 disconnected >> >>> E1227 22:53:08.817497 8073 process.cpp:1911] Failed to shutdown >> >>> socket with fd 17: Transport endpoint is not connected >> >>> I1227 22:53:08.817533 8050 master.cpp:2472] Disconnecting framework >> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at >> >>> scheduler(1)@15.242.100.60:59488 >> >>> I1227 22:53:08.817595 8050 master.cpp:2496] Deactivating framework >> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at >> >>> scheduler(1)@15.242.100.60:59488 >> >>> I1227 22:53:08.817797 8050 master.cpp:1146] Giving framework >> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at >> >>> scheduler(1)@15.242.100.60:59488 7625.14222623576weeks to failover >> >>> W1227 22:53:08.818389 8062 master.cpp:4840] Master returning >> >>> resources offered to framework >> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 because the framework has >> >>> terminated or is inactive >> >>> I1227 22:53:08.818397 8052 hierarchical.cpp:273] Deactivated >> >>> framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 >> >>> I1227 22:53:08.819046 8066 hierarchical.cpp:744] Recovered >> >>> cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000] >> >>> (total: cpus(*):32; mem(*):127878; disk(*):4336; >> >>> ports(*):[31000-32000], allocated: ) on slave >> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 from framework >> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 >> >>> ...... >> >>> >> >>> I can't figure out why Mesos master complains "Failed to shutdown >> >>> socket with fd 17: Transport endpoint is not connected". >> >>> Could someone give some clues on this issue? >> >>> >> >>> Thanks very much in advance! >> >>> >> >>> Best Regards >> >>> Nan Xiao >> >> >> >> > > > > > -- > Avinash Sridharan, Mesosphere > +1 (323) 702 5245