Hello,
I recently tried installing GE2011.11p1 and after configuring it I get a
seg-fault when trying to start the qmaster. If anyone has any insight I
would appreciate the help.
OS: Centos 6.3x86_32
I used/use the following set of commands to install GE2011, you'll note the
.po file mojo at the end, not sure if that's required but it did remove some
errors from the debug log.
----
export JAVA_HOME=/usr/lib/jvm/java-openjdk/
./aimk -only-depend
./scripts/zerodepend
./aimk depend
./aimk -no-java -no-jni -no-secure -spool-classic -no-dump -intl
./aimk -man
./scripts/distinst -y -basedir /opt/ge -vdir GE2011.11p1 -all -noexit
ln -s /usr/src/${PKGNAME}/source/scripts/mk_dist /opt/ge/GE2011.11p1/mk_dist
cp -r /usr/src/${PKGNAME}/source/dist/locale /opt/ge/GE2011.11p1/
mkdir -p /opt/ge/GE2011.11p1/locale/en/LC_MESSAGES/linux-x86
cp /opt/ge/GE2011.11p1/locale/en/LC_MESSAGES/gridengine.po
/opt/ge/GE2011.11p1/locale/en/LC_MESSAGES/linux-x86
msgfmt /opt/ge/GE2011.11p1/locale/en/LC_MESSAGES/linux-x86/gridengine.po -o
/opt/ge/GE2011.11p1/locale/en/LC_MESSAGES/linux-x86/gridengine.mo
----
Packages installed to support GE2011
package { 'java-1.7.0-openjdk': ensure => installed }
package { 'java-1.7.0-openjdk-devel': ensure => installed }
package { 'lib-X11-devel': ensure => installed }
package { 'openssl-static': ensure => installed }
package { 'tcsh': ensure => installed }
package { 'pam-devel': ensure => installed }
package { 'openmotif-devel': ensure => installed }
package { 'openssl': ensure => installed }
package { 'openssl-devel': ensure => installed }
package { 'libXpm-devel': ensure => installed }
package { 'ncurses-devel': ensure => installed }
package { 'ncurses': ensure => installed }
package { 'texinfo': ensure => installed }
package { 'unzip': ensure => installed }
package { 'gettext': ensure => installed }
package { 'gettext-devel': ensure => installed }
Debug output from sge_qmaster
starting sge_qmaster
0 2951 -1216837952 ****** starting localization procedure ...
**********
1 2951 -1216837952 could not get environment variable "GRIDPACKAGE"
2 2951 -1216837952 could not get environment variable "GRIDLOCALEDIR"
3 2951 -1216837952 setlocale() returns NULL 4 2951 -1216837952
locale directory: >/opt/ge/GE2011.1
5 2951 -1216837952 package file: >linux-x86/gridengine.mo<
6 2951 -1216837952 language (LANG): >en<
7 2951 -1216837952 loading message file:
/opt/ge/GE2011.11p1/locale/en/LC_MESSAGES/linux-x86/gridengine.mo
8 2951 -1216837952 found message file - ok
9 2951 -1216837952 setlocale() returns NULL
10 2951 -1216837952 bindtextdomain() returns
"/opt/ge/GE2011.11p1/locale"
11 2951 -1216837952 textdomain() returns "linux-x86/gridengine"
12 2951 -1216837952 error id output : enabled
13 2951 -1216837952 ****** starting localization procedure ...
success **
14 2951 -1216837952 sge_qmaster is not daemonized
15 2951 -1216837952 returning port value: 7111
16 2951 -1216837952 returning port value: 7112
17 2951 -1216837952 Getting host by name - Linux
18 2951 -1216837952 1 names in h_addr_list
19 2951 -1216837952 1 names in h_aliases
20 2951 -1216837952 Getting host by name - Linux
21 2951 -1216837952 1 names in h_addr_list
22 2951 -1216837952 1 names in h_aliases
23 2951 main Getting host by name - Linux
24 2951 main 1 names in h_addr_list
25 2951 main 1 names in h_aliases
26 2951 main creating QMASTER handle
27 2951 main act_qmaster file contains local host name
28 2951 main auid=0; agid=0
29 2951 main uid=0; gid=0; euid=0; egid=0 auid=0; agid=0
30 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/jobs"
31 2951 main retval = 0
32 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/zombies"
33 2951 main retval = 0
34 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/cqueues"
35 2951 main retval = 0
36 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/qinstances"
37 2951 main retval = 0
38 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/exec_hosts"
39 2951 main retval = 0
40 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/submit_hosts"
41 2951 main retval = 0
42 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/admin_hosts"
43 2951 main retval = 0
44 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/centry"
45 2951 main retval = 0
46 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/job_scripts"
47 2951 main retval = 0
48 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/pe"
49 2951 main retval = 0
50 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/ckpt"
51 2951 main retval = 0
52 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/usersets"
53 2951 main retval = 0
54 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/calendars"
55 2951 main retval = 0
56 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/hostgroups"
57 2951 main retval = 0
58 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/users"
59 2951 main retval = 0
60 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/projects"
61 2951 main retval = 0
62 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/resource_quotas"
63 2951 main retval = 0
64 2951 main Making dir
"/opt/ge/GE2011.11p1/default/spool/qmaster/advance_reservations"
65 2951 main retval = 0
66 2951 main Making dir
"/opt/ge/GE2011.11p1/default/common/local_conf"
67 2951 main retval = 0
68 2951 main reading CONFIG "global"
69 2951 main reading CONFIG "veli.local"
70 2951 main qualified_hostname: 'veli.local'
71 2951 main Complex Attributes----------------------
72 2951 main reading COMPLEX_ENTRY "s_vmem"
73 2951 main reading COMPLEX_ENTRY "s_core"
74 2951 main reading COMPLEX_ENTRY "virtual_used"
75 2951 main reading COMPLEX_ENTRY "s_rss"
76 2951 main reading COMPLEX_ENTRY "swap_total"
77 2951 main reading COMPLEX_ENTRY "load_long"
78 2951 main reading COMPLEX_ENTRY "virtual_free"
79 2951 main reading COMPLEX_ENTRY "np_load_avg"
80 2951 main reading COMPLEX_ENTRY "calendar"
81 2951 main reading COMPLEX_ENTRY "h_cpu"
82 2951 main reading COMPLEX_ENTRY "min_cpu_interval"
83 2951 main reading COMPLEX_ENTRY "h_rt"
84 2951 main reading COMPLEX_ENTRY "h_vmem"
85 2951 main reading COMPLEX_ENTRY "h_data"
86 2951 main reading COMPLEX_ENTRY "m_socket"
87 2951 main reading COMPLEX_ENTRY "mem_used"
88 2951 main reading COMPLEX_ENTRY "s_rt"
89 2951 main reading COMPLEX_ENTRY "virtual_total"
90 2951 main reading COMPLEX_ENTRY "swap_free"
91 2951 main reading COMPLEX_ENTRY "seq_no"
92 2951 main reading COMPLEX_ENTRY "slots"
93 2951 main reading COMPLEX_ENTRY "s_fsize"
94 2951 main reading COMPLEX_ENTRY "h_stack"
95 2951 main reading COMPLEX_ENTRY "h_fsize"
96 2951 main reading COMPLEX_ENTRY "load_avg"
97 2951 main reading COMPLEX_ENTRY "load_short"
98 2951 main reading COMPLEX_ENTRY "hostname"
99 2951 main reading COMPLEX_ENTRY "h_rss"
100 2951 main reading COMPLEX_ENTRY "np_load_short"
101 2951 main reading COMPLEX_ENTRY "arch"
102 2951 main reading COMPLEX_ENTRY "num_proc"
103 2951 main reading COMPLEX_ENTRY "np_load_medium"
104 2951 main reading COMPLEX_ENTRY "m_topology_inuse"
105 2951 main reading COMPLEX_ENTRY "h_core"
106 2951 main reading COMPLEX_ENTRY "tmpdir"
107 2951 main reading COMPLEX_ENTRY "s_data"
108 2951 main reading COMPLEX_ENTRY "rerun"
109 2951 main reading COMPLEX_ENTRY "qname"
110 2951 main reading COMPLEX_ENTRY "s_cpu"
111 2951 main reading COMPLEX_ENTRY "mem_total"
112 2951 main reading COMPLEX_ENTRY "s_stack"
113 2951 main reading COMPLEX_ENTRY "swap_rsvd"
114 2951 main reading COMPLEX_ENTRY "m_topology"
115 2951 main reading COMPLEX_ENTRY "cpu"
116 2951 main reading COMPLEX_ENTRY "load_medium"
117 2951 main reading COMPLEX_ENTRY "mem_free"
118 2951 main reading COMPLEX_ENTRY "swap_rate"
119 2951 main reading COMPLEX_ENTRY "np_load_long"
120 2951 main reading COMPLEX_ENTRY "m_core"
121 2951 main reading COMPLEX_ENTRY "swap_used"
122 2951 main reading COMPLEX_ENTRY "display_win_gui"
123 2951 main host_list----------------------------
124 2951 main reading EXECHOST "global"
125 2951 main reading EXECHOST "template"
126 2951 main reading ADMINHOST "veli.local"
127 2951 main manager_list----------------------------
128 2951 main root
129 2951 main host group definitions-----------
130 2951 main operator_list----------------------------
131 2951 main userset_list------------------------------
132 2951 main reading USERSET "deadlineusers"
133 2951 main reading USERSET "defaultdepartment"
134 2951 main reading USERSET "arusers"
135 2951 main calendar list ------------------------------
136 2951 main resource quota list -----------------------
137 2951 main
cluster_queue_list---------------------------------
138 2951 main pe_list---------------------------------
139 2951 main ckpt_list---------------------------------
140 2951 main advance reservation list -----------------------
141 2951 main job_list-----------------------------------
142 2951 main user list-----------------------------------
143 2951 main project list-----------------------------------
144 2951 main scheduler config
-----------------------------------
145 2951 main reading SCHEDD_CONF "sched_configuration"
146 2951 main sconf_validate: no config to validate
147 2951 main share tree
list-----------------------------------
148 2951 main reading SHARETREE "sharetree"
149 2951 main received qmaster_params are: none
150 2951 main event master functionality has been initialized
151 2951 -1275073680 /etc/init.d/sgemaster.p7111: line 760: 2951
Segmentation fault $bin_dir/sge_qmaster
Strace (last few lines)
open("qmaster.pid", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0644) = 4
close(4) = 0
open("qmaster.pid", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7895000
write(4, "3273\n", 5) = 5
close(4) = 0
munmap(0xb7895000, 4096) = 0
mmap2(NULL, 10489856, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xb40ff000
mprotect(0xb40ff000, 4096, PROT_NONE) = 0
clone(child_stack=0xb4aff494,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SET
write(2, " 150 3273 main ", 27 150 3273 main ) = 27
write(2, 0xbfc7f170, 52 <unfinished ...>
+++ killed by SIGSEGV +++
Segmentation fault
Long enough? :D
-Karl Vollmer
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users