I get into the same state regardless of whether I’m bringing the node up with
auto-discovery or I’ve manually defined it.
Here are the processes of a node that’s been up a few minutes:
[xCAT Genesis running on (none) /]# ps -elf
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S root 1 0 1 80 0 - 2869 wait 18:59 ? 00:00:08
/bin/sh /init
1 S root 2 0 0 80 0 - 0 kthrea 18:59 ? 00:00:00
[kthreadd]
…
1 S root 446 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[kthrotld/31]
1 S root 456 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[kpsmoused]
1 S root 457 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[usbhid_resumer]
1 S root 458 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[deferwq]
5 S root 481 1 0 76 -4 - 2720 poll_s 18:59 ? 00:00:01
udevd --daemon
5 S root 563 481 0 78 -2 - 2695 poll_s 18:59 ? 00:00:00
udevd --daemon
1 S root 567 1 0 80 0 - 2869 wait 18:59 ? 00:00:00
/bin/sh /init
4 S root 569 567 0 80 0 - 5499 pause 18:59 ? 00:00:00
screen -ln
5 S root 570 569 0 80 0 - 5499 poll_s 18:59 ? 00:00:00
SCREEN -ln
4 S root 571 570 0 80 0 - 2835 n_tty_ 18:59 pts/0 00:00:00
/bin/sh
1 S root 576 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[mlx4]
1 S root 640 2 0 80 0 - 0 scsi_e 18:59 ? 00:00:00
[scsi_eh_0]
1 S root 641 2 0 80 0 - 0 scsi_e 18:59 ? 00:00:00
[scsi_eh_1]
1 S root 642 2 0 80 0 - 0 scsi_e 18:59 ? 00:00:00
[scsi_eh_2]
1 S root 643 2 0 80 0 - 0 scsi_e 18:59 ? 00:00:00
[scsi_eh_3]
1 S root 644 2 0 80 0 - 0 scsi_e 18:59 ? 00:00:00
[scsi_eh_4]
1 S root 645 2 0 80 0 - 0 scsi_e 18:59 ? 00:00:00
[scsi_eh_5]
1 S root 707 2 0 99 19 - 0 ipmi_t 18:59 ? 00:00:00
[kipmi0]
4 S root 855 1 0 80 0 - 5499 pause 18:59 ? 00:00:00
screen -L -ln doxcat
5 S root 856 855 0 80 0 - 5500 poll_s 18:59 ? 00:00:00
SCREEN -L -ln doxcat
4 S root 857 856 0 80 0 - 2309 wait 18:59 pts/1 00:00:00
/bin/sh /bin/doxcat
1 S root 860 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[kondemand/0]
…
1 S root 891 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[kondemand/31]
5 S rpc 923 1 0 80 0 - 4744 poll_s 18:59 ? 00:00:00
rpcbind
5 S root 925 1 0 80 0 - 5837 poll_s 18:59 ? 00:00:00
rpc.statd
5 S root 930 1 0 80 0 - 16672 poll_s 18:59 ? 00:00:00
/usr/sbin/sshd
5 S root 953 1 0 80 0 - 3396 poll_s 18:59 ? 00:00:00
lldpad -d
4 S root 982 857 0 80 0 - 2280 poll_s 18:59 pts/1 00:00:00
dhclient -6 -pf /var/run/dhclient6.eth0.pid eth0 -lf
/var/lib/dhclient/dhclient6.
1 S root 994 857 0 80 0 - 2309 wait 18:59 pts/1 00:00:00
/bin/sh /bin/doxcat
1 S root 995 857 0 80 0 - 2309 wait 18:59 pts/1 00:00:00
/bin/sh /bin/doxcat
1 S root 1759 1 0 80 0 - 2280 poll_s 19:00 ? 00:00:00
dhclient -cf /etc/dhclient.conf -pf /var/run/dhclient.eth0.pid eth0
5 S root 1773 1 0 80 0 - 6627 poll_s 19:00 ? 00:00:00
ntpd -g -x
5 S root 1787 481 0 78 -2 - 2719 poll_s 19:00 ? 00:00:00
udevd --daemon
1 S root 1807 2 0 80 0 - 0 kaudit 19:00 ? 00:00:00
[kauditd]
5 S root 1834 1 0 80 0 - 31077 poll_s 19:00 ? 00:00:00
/sbin/rsyslogd -c4
4 S root 2896 930 0 80 0 - 17830 - 19:06 ? 00:00:00
sshd: root@pts/2
4 S root 2924 2896 0 80 0 - 2835 wait 19:06 pts/2 00:00:00
-bash
0 S root 2959 994 0 80 0 - 1018 hrtime 19:07 pts/1 00:00:00
sleep 5
0 S root 2960 995 0 80 0 - 1018 hrtime 19:07 pts/1 00:00:00
sleep 5
0 S root 2961 857 0 80 0 - 1018 hrtime 19:07 pts/1 00:00:00
sleep 1
4 R root 2962 2924 2 80 0 - 3344 - 19:07 pts/2 00:00:00 ps
-elf
From: Xiao Peng Wang [mailto:[email protected]]
Sent: Tuesday, February 2, 2016 2:17 AM
To: [email protected]
Cc: [email protected]
Subject: Re: [xcat-user] Failure booting genesis kernel
It's possible that genesis is waiting for the tasks to run instead of dead.
Could show out the out of 'ps -elf' in the genesis to see what processes are
running.
How did you get your node into genesis? A new node which got into genesis for
discovery, or you run the 'nodeset' to force the node got into genesis to run
certain task?
Thanks
Best Regards
----------------------------------------------------------------------
Wang Xiaopeng (王晓朋)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email: [email protected]<mailto:[email protected]>
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian
District Beijing P.R.China 100193
----- Original message -----
From: "Westlund, John A"
<[email protected]<mailto:[email protected]>>
To: xCAT Users Mailing list
<[email protected]<mailto:[email protected]>>, Er
Tao Zhao/China/IBM@IBMCN
Cc:
Subject: Re: [xcat-user] Failure booting genesis kernel
Date: Tue, Feb 2, 2016 3:04 PM
I can ping and get into the node:
[xCAT Genesis running on (none) /]# ls
bin debian emergency init initqueue-finished
initqueue-timeout lib64 netroot pre-pivot pre-udev root
screenlog.0 sysroot usr
cmdline dev etc initqueue initqueue-settled lib
mount pre-mount pre-trigger proc sbin sys
tmp var
This is what is running:
# lsxcatd -a
Version 2.11 (git commit 9ea36ca6163392bf9ab684830217f017193815be, built Mon
Nov 30 05:43:11 EST 2015)
This is a Management Node
dbengine=SQLite
John
From: Xiao Peng Wang [mailto:[email protected]]
Sent: Monday, February 1, 2016 10:23 PM
To: [email protected]<mailto:[email protected]>; Er
Tao Zhao
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [xcat-user] Failure booting genesis kernel
You mentioned the genesis got a dead end, could you ping to the compute node or
try to login the compute node? Please run the 'lsxcatd -a' to show the xcat
version.
Thanks
Best Regards
----------------------------------------------------------------------
Wang Xiaopeng (王晓朋)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email: [email protected]<mailto:[email protected]>
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian
District Beijing P.R.China 100193
----- Original message -----
From: "Westlund, John A"
<[email protected]<mailto:[email protected]>>
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Cc:
Subject: [xcat-user] Failure booting genesis kernel
Date: Tue, Feb 2, 2016 12:07 PM
I’m trying to bring up a new system, but have run into a dead end. I get no
error message other than a blinking question mark in a diamond (some
un-assigned UTF character):
CLIENT MAC ADDR: 84 8F 69 FD 4F 28 GUID: 44454C4C 4300 1042 8054 B6C04F355631
CLIENT IP: 192.168.91.9 MASK: 255.255.240.0 DHCP IP: 10.10.1.167
GATEWAY IP: 192.168.92.54
PXE->EB:�P: 192.168.92.54
PXE->EB: !PXE at 98D2:0070, entry point at 98D2:0106
UNDI code segment 98D2:5210, data segment 9297:63B0 (586-632kB)
UNDI device is PCI 02:00.0, type DIX+802.3
546kB free base memory after PXE unload
xNBA initialising devices...ok
xCAT Network Boot Agent
iPXE 1.0.3-131028 (d603e) -- Open Source Network Boot Firmware -- http://ipxe.or
g
Features: HTTP HTTPS iSCSI DNS TFTP bzImage ELF PXE PXEXT
net0: 84:8f:69:fd:4f:28 using undionly on UNDI-PCI02:00.0 (open)
[Link:up, TX:0 TXE:0 RX:0 RXE:0]
DHCP (net0 84:8f:69:fd:4f:28)... ok
net0: 192.168.91.9/255.255.240.0 gw 192.168.92.54
Next server: 192.168.92.53
Filename: http://192.168.92.53/tftpboot/xcat/xnba/nets/192.168.80.0_20
http://192.168.92.53/tftpboot/xcat/xnba/nets/192.168.80.0_20... ok
http://192.168.92.53/tftpboot/xcat/genesis.kernel.x86_64... ok
http://192.168.92.53/tftpboot/xcat/genesis.fs.x86_64.lzma... 74%
I’m assuming the genesis.fs finishes loading even though it read “74%,” and a
CSI code bounces the cursor up the screen before failing.
Where should I be looking for debug this?
Thanks,
John
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
xCAT-user mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
xCAT-user mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user