Hello Reuti
this is the output of ps -e f
master@sgemstr:~$ ps -e f
PID TTY STAT
TIME COMMAND
2 ? S
0:00 [kthreadd]
3 ? S
0:00 \_ [ksoftirqd/0]
4 ? S
0:00 \_ [kworker/0:0]
5 ? S< 0:00
\_ [kworker/0:0H]
7 ? S
0:00 \_ [migration/0]
8 ? S
0:00 \_ [rcu_bh]
9 ? S
0:00 \_ [rcuob/0]
10 ? S
0:00 \_ [rcuob/1]
11 ? S
0:00 \_ [rcuob/2]
12 ? S
0:00 \_ [rcuob/3]
13 ? S
0:00 \_ [rcuob/4]
14 ? S
0:00 \_ [rcuob/5]
15 ? S
0:00 \_ [rcuob/6]
16 ? S
0:00 \_ [rcuob/7]
17 ? S
0:00 \_ [rcu_sched]
18 ? S
0:00 \_ [rcuos/0]
19 ? S
0:00 \_ [rcuos/1]
20 ? S
0:00 \_ [rcuos/2]
21 ? S
0:00 \_ [rcuos/3]
22 ? S
0:00 \_ [rcuos/4]
23 ? S
0:00 \_ [rcuos/5]
24 ? S
0:00 \_ [rcuos/6]
25 ? S
0:00 \_ [rcuos/7]
26 ? S
0:00 \_ [watchdog/0]
27 ? S
0:00 \_ [watchdog/1]
28 ? S
0:00 \_ [migration/1]
29 ? S
0:00 \_ [ksoftirqd/1]
30 ? S
0:00 \_ [kworker/1:0]
31 ? S< 0:00
\_ [kworker/1:0H]
32 ? S
0:00 \_ [watchdog/2]
33 ? S
0:00 \_ [migration/2]
34 ? S
0:00 \_ [ksoftirqd/2]
35 ? S
0:00 \_ [kworker/2:0]
36 ? S< 0:00
\_ [kworker/2:0H]
37 ? S
0:00 \_ [watchdog/3]
38 ? S
0:00 \_ [migration/3]
39 ? S
0:00 \_ [ksoftirqd/3]
40 ? S
0:00 \_ [kworker/3:0]
41 ? S< 0:00
\_ [kworker/3:0H]
42 ? S< 0:00
\_ [khelper]
43 ? S
0:00 \_ [kdevtmpfs]
44 ? S< 0:00
\_ [netns]
45 ? S< 0:00
\_ [writeback]
46 ? S< 0:00
\_ [kintegrityd]
47 ? S< 0:00
\_ [bioset]
49 ? S< 0:00
\_ [kblockd]
50 ? S<
0:00 \_ [ata_sff]
51 ? S
0:00 \_ [khubd]
52 ? S< 0:00
\_ [md]
53 ? S< 0:00
\_ [devfreq_wq]
54 ? S
0:00 \_ [kworker/3:1]
55 ? S
0:00 \_ [kworker/2:1]
57 ? S 0:00
\_ [khungtaskd]
58 ? S
0:00 \_ [kswapd0]
59 ? SN
0:00 \_ [ksmd]
60 ? SN
0:00 \_ [khugepaged]
61 ? S
0:00 \_ [fsnotify_mark]
62 ? S
0:00 \_ [ecryptfs-kthrea]
63 ? S<
0:00 \_ [crypto]
75 ? S< 0:00
\_ [kthrotld]
79 ? S< 0:00
\_ [dm_bufio_cache]
99 ? S< 0:00
\_ [deferwq]
100 ? S< 0:00
\_ [charger_manager]
101 ? S
0:00 \_ [kworker/0:1]
273 ?
S 0:00 \_ [scsi_eh_0]
274 ? S
0:00 \_ [scsi_eh_1]
275 ? S
0:00 \_ [scsi_eh_2]
276 ? S
0:00 \_ [scsi_eh_3]
277 ? S
0:00 \_ [scsi_eh_4]
278 ? S
0:00 \_ [scsi_eh_5]
281 ? S
0:00 \_ [kworker/u16:5]
283 ? S
0:00 \_ [kworker/u16:7]
310 ? S
0:00 \_ [jbd2/sda7-8]
311 ? S< 0:00
\_ [ext4-rsv-conver]
312 ? S< 0:00
\_ [ext4-unrsv-conv]
599 ? S< 0:00
\_ [kmemstick]
601 ? S 0:00
\_ [irq/45-mei_me]
607 ? S< 0:00
\_ [kpsmoused]
621 ? S< 0:00
\_ [rpciod]
624 ? S
0:00 \_ [kworker/1:2]
659 ? S< 0:00
\_ [ktpacpid]
686 ? S<
0:00 \_ [cfg80211]
695 ? S< 0:00
\_ [nfsiod]
806 ? S< 0:00
\_ [kworker/u17:1]
809 ? S< 0:00
\_ [hci0]
810 ? S< 0:00
\_ [hci0]
811 ? S< 0:00
\_ [kworker/u17:2]
824 ? S< 0:00
\_ [hd-audio0]
888 ? S
0:00 \_ [wl_event_handle]
937 ? S< 0:00
\_ [ttm_swap]
968 ? S< 0:00
\_ [krfcommd]
1177 ? S< 0:00
\_ [nfsd4]
1178 ? S< 0:00
\_ [nfsd4_callbacks]
1179 ? S
0:00 \_ [lockd]
1182 ? S
0:00 \_ [nfsd]
1183 ? S
0:00 \_ [nfsd]
1184 ? S
0:00 \_ [nfsd]
1185 ? S
0:00 \_ [nfsd]
1186 ? S
0:00 \_ [nfsd]
1187 ? S
0:00 \_ [nfsd]
1188 ? S
0:00 \_ [nfsd]
1189 ? S
0:00 \_ [nfsd]
1 ? Ss
0:00 /sbin/init
386 ? S
0:00 upstart-udev-bridge --daemon
388 ? Ss
0:00 /sbin/udevd --daemon
547 ? S
0:00 \_ /sbin/udevd --daemon
548 ? S
0:00 \_ /sbin/udevd --daemon
632 ? Ss
0:00 /usr/sbin/sshd -D
873 ? Sl
0:00 rsyslogd -c5
874 ? Ss
0:00 rpc.idmapd
882 ? S
0:00 upstart-socket-bridge --daemon
885 ? Ss
0:00 dbus-daemon --system --fork --activation=upstart
920 ? Ss
0:00 /usr/sbin/bluetoothd
922 ? Ss
0:00 rpcbind -w
967 ? Ss
0:00 /usr/sbin/modem-manager
983 ? S
0:00 avahi-daemon: running [sgemstr.local]
985 ? S
0:00 \_ avahi-daemon: chroot
helper
1000 ? Ss
0:00 /usr/sbin/cupsd -F
1004 ? Ss
0:00 rpc.statd -L
1008 ? Ssl
0:00 NetworkManager
2058 ? S
0:00 \_ /usr/sbin/dnsmasq --no-resolv
--keep-in-foregroun
1016 ? Sl
0:00 /usr/lib/policykit-1/polkitd --no-debug
1085 tty4 Ss+
0:00 /sbin/getty -8 38400 tty4
1092 tty5 Ss+
0:00 /sbin/getty -8 38400 tty5
1094 ? Ss
0:02 /sbin/wpa_supplicant -B -P /run/sendsigs.omit.d/wpasu
1113 tty2 Ss+
0:00 /sbin/getty -8 38400 tty2
1114 tty3 Ss+
0:00 /sbin/getty -8 38400 tty3
1116 tty6 Ss+
0:00 /sbin/getty -8 38400 tty6
1122 ? Ss
0:00 acpid -c /etc/acpi/events -s /var/run/acpid.socket
1127 ? Ss
0:00 cron
1128 ? Ss
0:00 atd
1134 ? Ssl
0:00 lightdm
1172 tty7 Ssl+
0:04 \_ /usr/bin/X :0 -auth
/var/run/lightdm/root/:0 -nol
1599 ? Sl
0:00 \_ lightdm --session-child
12 19
1852 ? Ssl
0:00 \_ gnome-session
--session=ubuntu
1898 ? Ss
0:00 \_
/usr/bin/ssh-agent /usr/bin/dbus-launch -
1912 ? Sl
0:00 \_
/usr/lib/gnome-settings-daemon/gnome-sett
1936 ? S
0:00 | \_ syndaemon -i 2.0 -K -R -t
1929 ? Sl
0:05 \_ compiz
2037 ? Ss
0:00 | \_ /bin/sh -c /usr/bin/compiz-decorator
2038 ? Sl
0:00 | \_ /usr/bin/gtk-window-decorator
1959 ? Sl
0:00 \_ nautilus -n
1961 ? Sl
0:00 \_ bluetooth-applet
1962 ? Sl
0:00 \_
/usr/lib/gnome-settings-daemon/gnome-fall
1963 ? Sl
0:00 \_ nm-applet
1972 ? Sl
0:00 \_ /usr/lib/policykit-1-gnome/polkit-gnome-a
2233 ? Sl
0:00 \_
/usr/lib/gnome-disk-utility/gdu-notificat
2236 ? Sl
0:00 \_
telepathy-indicator
2254 ? Sl
0:00 \_ zeitgeist-datahub
2427 ?
Sl 0:00 \_ update-notifier
2482 ?
Sl 0:00 \_
/usr/lib/deja-dup/deja-dup/deja-dup-monit
1135 ? Ss
0:00 /usr/sbin/irqbalance
1175 ? Ssl
0:00 whoopsie
1193 ? Ss
0:00 /usr/sbin/rpc.mountd --manage-gids
1365 ? Sl
0:00 /opt/sge/bin/lx-amd64/sge_qmaster
1405 ? Sl
0:00 /usr/lib/accountsservice/accounts-daemon
1432 ? Sl
0:00 /usr/sbin/console-kit-daemon --no-daemon
1555 ? Sl
0:00 /usr/lib/upower/upowerd
1752 ? SNl
0:00 /usr/lib/rtkit/rtkit-daemon
1768 ? Sl
0:00 /usr/lib/x86_64-linux-gnu/colord/colord
1841 ? Sl
0:00 /usr/bin/gnome-keyring-daemon --daemonize --login
1901 ? S
0:00 /usr/bin/dbus-launch --exit-with-session gnome-sessio
1902 ? Ss
0:00 //bin/dbus-daemon --fork --print-pid 5 --print-addres
1920 ? S
0:00 /usr/lib/gvfs/gvfsd
1922 ? Sl
0:00 /usr/lib/gvfs//gvfs-fuse-daemon -f /home/master/.gvfs
1941 ? S<l 0:00 /usr/bin/pulseaudio --start
--log-target=syslog
1946 ? S
0:00 \_
/usr/lib/pulseaudio/pulse/gconf-helper
1943 ? S
0:00 /usr/lib/x86_64-linux-gnu/gconf/gconfd-2
1948 ? S
0:00 /usr/lib/gvfs/gvfsd-metadata
1971 ? S
0:00 /usr/lib/gvfs/gvfs-gdu-volume-monitor
1979 ? Sl
0:00 /usr/lib/udisks/udisks-daemon
1980 ? S
0:00 \_ udisks-daemon: not
polling any devices
1984 ? Sl
0:00 /usr/lib/gvfs/gvfs-afc-volume-monitor
1987 ? S
0:00 /usr/lib/gvfs/gvfs-gphoto2-volume-monitor
2003 ? Sl
0:00 /usr/lib/notify-osd/notify-osd
2007 ? S
0:00 /usr/lib/gvfs/gvfsd-trash --spawner :1.5 /org/gtk/gvf
2011 ? Sl
0:00 /usr/bin/gnome-screensaver --no-daemon
2016 ? S
0:00 /usr/lib/gvfs/gvfsd-burn --spawner :1.5 /org/gtk/gvfs
2019 ? Sl
0:00 /usr/lib/bamf/bamfdaemon
2043 ? Sl
0:00 /usr/lib/unity/unity-panel-service
2045 ? Sl 0:00 /usr/lib/indicator-appmenu/hud-service
2064 ? Sl
0:00 /usr/lib/indicator-session/indicator-session-service
2066 ? Sl
0:00 /usr/lib/indicator-datetime/indicator-datetime-servic
2068 ? Sl
0:00 /usr/lib/indicator-messages/indicator-messages-servic
2070 ? Sl
0:00 /usr/lib/indicator-sound/indicator-sound-service
2079 ? Sl
0:00 /usr/lib/indicator-printers/indicator-printers-servic
2080 ? Sl
0:00 /usr/lib/indicator-application/indicator-application-
2109 ? S 0:00 /usr/lib/geoclue/geoclue-master
2112 ? Sl
0:00 /usr/lib/ubuntu-geoip/ubuntu-geoip-provider
2190 ? S
0:00 /opt/sge/bin/lx-amd64/sge_shadowd
2226 tty1 Ss+
0:00 /sbin/getty -8 38400 tty1
2243 ? Sl
0:00 /usr/lib/telepathy/mission-control-5
2248 ? Sl
0:00 /usr/lib/gnome-online-accounts/goa-daemon
2262 ? Sl
0:00 /usr/bin/zeitgeist-daemon
2268 ? Sl
0:00 /usr/lib/zeitgeist/zeitgeist-fts
2276 ? S
0:00 \_ /bin/cat
2287 ? Sl
0:00 /usr/lib/unity-lens-applications/unity-applications-d
2290 ? Sl
0:00 /usr/bin/python /usr/lib/unity-lens-video/unity-lens-
2292 ? Sl
0:00 /usr/lib/unity-lens-files/unity-files-daemon
2293 ? Sl
0:00
/usr/lib/unity-lens-music/unity-music-daemon
2322 ? Sl
0:01 gnome-terminal
2327 ? S
0:00 \_ gnome-pty-helper
2329 pts/0 Ss
0:00 \_ bash
2616 pts/0 R+
0:00 \_ ps -e f
2385 ? Sl
0:00 /usr/bin/python /usr/lib/unity-scope-video-remote/uni
2387 ? Sl
0:00 /usr/lib/unity-lens-music/unity-musicstore-daemon
2438 ? S
0:00 /usr/bin/python /usr/lib/system-service/system-servic
2442 ? SNl
0:04 /usr/bin/python /usr/bin/update-manager --no-focus-on
2448 ? Sl
0:00 /usr/lib/dconf/dconf-service
2463 ? SN
0:01 /usr/bin/python /usr/sbin/aptd
and about the port if it is open is listening state meaning there is no problem
with the port??
root@sgemstr:~# netstat -nltp |grep 644
tcp 0 0 0.0.0.0:6444 0.0.0.0:* LISTEN
1365/sge_qmaster
finally about the firewall i never change anything or give a role and this is
the output :
root@sgemstr:~# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt
source destination
Chain FORWARD (policy ACCEPT)
target prot opt
source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
is't that mean i have an empty iptables??
many regards..
On Wednesday, October 29, 2014 9:01 PM, Reuti <[email protected]>
wrote:
Please keep the list posted.
Am 29.10.2014 um 18:47 schrieb Disny Disny:
> Hello Reuti
> this is the output of qhost and qstat -f but i don't know what it means so
> i'm hoping you can help
>
> kind regards..
>
> root@sgemstr:~# qhost
> HOSTNAME ARCH NCPU NSOC NCOR NTHR NLOAD MEMTOT
> MEMUSE SWAPTO SWAPUS
> ----------------------------------------------------------------------------------------------
> global - - - - - - -
> - - -
> gcl1 lx-amd64 4 1 4 4 - 3.8G
> - 6.7G -
> gcl2 lx-amd64 4 1 4 4 - 3.7G
> - 3.8G -
> gcl3 lx-amd64 4 1 4 4 - 1.9G
> - 6.7G -
> shdwgcl4 lx-amd64 4 1 4 4 - 3.8G
> - 3.8G -
> root@sgemstr:~# qstat -f
> queuename qtype resv/used/tot. np_load arch
> states
> ---------------------------------------------------------------------------------
> all.q@gcl1 BIP 0/0/4 -NA- lx-amd64 au
> ---------------------------------------------------------------------------------
> all.q@gcl2 BIP 0/0/4 -NA- lx-amd64 au
> ---------------------------------------------------------------------------------
> all.q@gcl3 BIP 0/0/4 -NA- lx-amd64 au
> ---------------------------------------------------------------------------------
> all.q@shdwgcl4 BIP 0/0/4 -NA- lx-amd64 au
This looks like there is no communication between the qmaster and the execds.
Checking the output of:
$ ps -e f
shows the `sgemaster` resp. `sgexecd` running on the systems? Do you have a
firewall in place? Maybe the port 6444 and 6445 needs to be opened.
-- Reuti
>
> ############################################################################
> - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
> ############################################################################
> 4 0.00000 Sleeper root qw 10/23/2014 09:20:09 1
> root@sgemstr:~#
>
>
> On Thursday, October 23, 2014 6:38 PM, Reuti <[email protected]>
> wrote:
>
>
> Please check in `qhost` resp. `qstat -f` the state of the machines, i.e.
> whether the execd can be reached by returning a suitable value for the
> machines. - Reuti
>
> Am 23.10.2014 um 17:35 schrieb Disny Disny:
>
> > Yes during the exec installation it added a startup script but is there
> > other startup i need to add to it manually??
> >
> >
> > From: Reuti <[email protected]>;
> > To: Disny Disny <[email protected]>;
> > Cc: grid Engine Mailing List <[email protected]>;
> > Subject: Re: Queue instances dropped
> > Sent: Thu, Oct 23, 2014 3:29:58 PM
> >
> > Am 23.10.2014 um 17:23 schrieb Disny Disny:
> >
> >
> > > I have a problem with Sge ..after installing the cluster everything
> > > wotked fine but when i shut down the pcs and in other time i start them
> > > and try to submit ajob i got this message :
> > > queue instance "all.q@gcl2" droped because It is temprerly not available
> > >
> > > queue instance "all.q@gcl3" droped because It is temprerly not available
> > >
> > > queue instance "all.q@shdwgcl4" droped because It is temprerly not
> > > available
> > >
> > > queue instance "all.q@gcl1" droped because It is temprerly not available
> > > all queues are dropped because of overload or full.
> > > I appreaciate any help.
> >
> >
> > Are the execd's running on the ndoes - maybe they need to be added to your
> > startup mechanism to do it automatically in case you shutdown the machines?
> >
> > -- Reuti
>
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users