[xcat-user] tftpd error - Can't get it to work

2021-03-18 Thread Russell Jones
Brand new xcat management server running CentOS 8.3. I've searched high and
low and cannot figure out why I am getting the following error when I try
to PXE boot a node:

Mar 18 12:49:12 xcat8 in.tftpd[2962]: RRQ from 172.21.4.1 filename
xcat/xnba.efi
*Mar 18 12:49:12 xcat8 in.tftpd[2962]: tftpd: read: Connection refused*
Mar 18 12:49:16 xcat8 in.tftpd[2963]: RRQ from 172.21.4.1 filename
xcat/xnba.efi
Mar 18 12:49:16 xcat8 in.tftpd[2963]: Client 172.21.4.1 finished
xcat/xnba.efi


Any assistance in troubleshooting is appreciated! The xcatprobe check
returns good except for DNS (expected, we host DNS off the management node).

=== SUMMARY

[MN]: Checking on MN...
  [FAIL]
Checking DNS service is configured...
  [FAIL]
DNS service isn't ready on 172.21.0.103
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] tftpd error - Can't get it to work

2021-03-18 Thread Russell Jones
Some more information for future folks that see this - "Secure efi boot"
was the issue. Disabling secure boot allowed it to PXE successfully.

On Thu, Mar 18, 2021 at 12:53 PM Russell Jones  wrote:

> Brand new xcat management server running CentOS 8.3. I've searched high
> and low and cannot figure out why I am getting the following error when I
> try to PXE boot a node:
>
> Mar 18 12:49:12 xcat8 in.tftpd[2962]: RRQ from 172.21.4.1 filename
> xcat/xnba.efi
> *Mar 18 12:49:12 xcat8 in.tftpd[2962]: tftpd: read: Connection refused*
> Mar 18 12:49:16 xcat8 in.tftpd[2963]: RRQ from 172.21.4.1 filename
> xcat/xnba.efi
> Mar 18 12:49:16 xcat8 in.tftpd[2963]: Client 172.21.4.1 finished
> xcat/xnba.efi
>
>
> Any assistance in troubleshooting is appreciated! The xcatprobe check
> returns good except for DNS (expected, we host DNS off the management node).
>
> === SUMMARY
> 
> [MN]: Checking on MN...
> [FAIL]
> Checking DNS service is configured...
> [FAIL]
> DNS service isn't ready on 172.21.0.103
>
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] Second IP address on nic not being added

2021-06-04 Thread Russell Jones
Hi all,

I am trying to follow the documentation (
https://xcat-docs.readthedocs.io/en/2.11/guides/admin-guides/manage_clusters/ppc64le/diskless/customize_image/cfg_second_adapter.html)
for adding a second IP address to a physical nic, but it is not working.

My node definition for the IPs:

nicips.eno1=10.55.167.2|10.55.167.240
nicnetworks.eno1=10_55_Corporate|10_55_Corporate
nictypes.eno1=Ethernet

What I get in the node's logs on boot:
[E]:Error: configeth on node-n1: Network object 10_55_Corporate is not
defined.

The node boots and the first IP gets applied to the interface properly, but
the second does not.

The above network name *does* exist.

Any ideas?
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Second IP address on nic not being added

2021-06-06 Thread Russell Jones
Thanks!

This is CentOS 7 for both the mgmt node and compute nodes, and yes this is
for an older version.

I saw those docs as well and there is no difference I am seeing in the
commands other than also adding nicaliases which I do not need to do as I
am not hosting DNS on the mgmt node.

Having a network interface with a second IP in the same network/subnet is a
supported configuration, at least on Linux :-). It works fine manually
adding it on the node.

On Fri, Jun 4, 2021, 2:51 PM Nathan A Besaw  wrote:

> The link you provided below is for an older version of xCAT (2.11), is
> that the version you are using on your cluster?
>
> What OS are you running on your compute node?
>
> If you are running a more recent OS and more recent version of xCAT, I
> think these instructions are probably more applicable:
>
> https://xcat-docs.readthedocs.io/en/2.16.2/guides/admin-guides/manage_clusters/common/deployment/network/cfg_network_aliases.html
>
> Note: I think you should try to split 10_55_Corporate into two separate,
> non-overlapping network definitions. I think assigning two addresses from
> the same subnet to a single interface is unlikely to work well.
>
> [image: Inactive hide details for Russell Jones ---06/04/2021 03:15:26
> PM---Hi all, I am trying to follow the documentation (]Russell Jones
> ---06/04/2021 03:15:26 PM---Hi all, I am trying to follow the documentation
> (
>
> From: Russell Jones 
> To: xCAT-user@lists.sourceforge.net
> Date: 06/04/2021 03:15 PM
> Subject: [EXTERNAL] [xcat-user] Second IP address on nic not being added
> --
>
>
>
> Hi all,
>
> I am trying to follow the documentation (
> *https://xcat-docs.readthedocs.io/en/2.11/guides/admin-guides/manage_clusters/ppc64le/diskless/customize_image/cfg_second_adapter.html*
> <https://xcat-docs.readthedocs.io/en/2.11/guides/admin-guides/manage_clusters/ppc64le/diskless/customize_image/cfg_second_adapter.html>)
> for adding a second IP address to a physical nic, but it is not working.
>
> My node definition for the IPs:
>
> nicips.eno1=10.55.167.2|10.55.167.240
> nicnetworks.eno1=10_55_Corporate|10_55_Corporate
> nictypes.eno1=Ethernet
>
> What I get in the node's logs on boot:
> [E]:Error: configeth on node-n1: Network object 10_55_Corporate is not
> defined.
>
> The node boots and the first IP gets applied to the interface properly,
> but the second does not.
>
> The above network name *does* exist.
>
> Any ideas?___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
>
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] (already) Provisioned nodes are failing to boot off disk

2021-09-01 Thread Russell Jones
Is the mac correct for the node?

On Wed, Sep 1, 2021 at 11:02 AM Imam Toufique  wrote:

> Hi,
>
> Need your helpful thoughts here with a problem we have, please.
>
> We have nodes that were provisioned with xcat, they are running, OS is
> working and installed.  The boot order is set to PXE first, SSD 2nd.
>
> Several days ago, when I rebooted one of the nodes, it went straight to
> PXE discovery mode - attempting for an install.  This is a node that is
> built, it should have exited the PXE boot mode and boot off the disk, but
> it never did.
>
> I am not sure what's going on, it looks like xcat has lost the status of
> the node, whether it is installed or not ( need provisioning?)
>
> Here is the 'lsdef' output of the node:
>
> ```
> [root@mn] lsdef -t node hpc3-14-03
> Object name: hpc3-14-03
> arch=x86_64
> cpucount=40
> cputype=Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
> currchain=boot
> currstate=boot
> disksize=sda:224GB,sdb:224GB
> groups=centos78
> ip=10.240.58.16
> mac=08:f1:ea:e4:35:52
> memory=193122MB
> mtm=HPE:ProLiant XL170r Gen10
> netboot=xnba
> nichostnamesuffixes.ib0=-ib0
> nichostnamesuffixes.ipmi=-ipmi
> nicips.ib0=10.240.60.16
> nicips.ipmi=10.240.62.16
> os=centos7.7
> postbootscripts=otherpkgs,hpc3-postscripts/hpc3postbootscript
>
> postscripts=syslog,remoteshell,syncfiles,setupntp,hpc3-postscripts/hpc3postscript.1,confignetwork
> -s
> profile=compute
> provmethod=centos7.8-x86_64-install-compute
> serial=2M294204L9
> status=booted
> statustime=06-21-2021 16:41:45
> supportedarchs=x86,x86_64
>
> ```
>
> ```
> [root@mn]# nodediscoverls |grep 14-03
>   38363730-3535-324D-3239-343230344C39hpc3-14-03  manual
>   HPE:ProLiant XL170r Gen10 2M294204L9
> ```
>
> ```
> [root@mn]# lsdef -t network compute_net_1
>
> Object name: compute_net_1
>
> domain=local
>
> dynamicrange=10.240.58.221-10.240.58.240
>
> gateway=10.240.58.1
>
> mask=255.255.254.0
> mgtifname=eno1
> mtu=1500
> nameservers=10.240.58.4,8.8.8.8,128.200.192.202
> net=10.240.58.0
> staticrange=10.240.58.4-10.240.59.220
> tftpserver=
> ```
>
> Any idea what might be going on here?  Why an already setup/installed node
> is going back to discovery ( and wanting to be installed) mode?
>
> Can someone please shed some light?
>
> thanks a lot!
>
>
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] (already) Provisioned nodes are failing to boot off disk

2021-09-02 Thread Russell Jones
Look at the man pages for "makedhcp"

In my experience when there is no fixed-address for a host, that means DNS
is wrong. Make sure your DNS entries are correct for that host, and that
the management node can resolve both the hostname and reverse DNS entries
for the IP address.

Also I noticed in your node definition you don't have a nicips.eth0/ensXXX
- Check that.

On Wed, Sep 1, 2021 at 12:09 PM Imam Toufique  wrote:

> Yes, that is correct.  Below is the output :
>
> [root@hpc3-14-03 ~]# ip a
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
> default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
>valid_lft forever preferred_lft forever
> inet6 ::1/128 scope host
>valid_lft forever preferred_lft forever
> 2: eno1:  mtu 1500 qdisc noop state DOWN group
> default qlen 1000
> link/ether 08:f1:ea:9e:c7:60 brd ff:ff:ff:ff:ff:ff
> 3: eno2:  mtu 1500 qdisc noop state DOWN group
> default qlen 1000
> link/ether 08:f1:ea:9e:c7:61 brd ff:ff:ff:ff:ff:ff
> 4: eno3:  mtu 1500 qdisc mq state UP
> group default qlen 1000
> link/ether 08:f1:ea:e4:35:52 brd ff:ff:ff:ff:ff:ff
> inet 10.240.58.16/23 brd 10.240.59.255 scope global eno3
>valid_lft forever preferred_lft forever
> inet6 fe80::af1:eaff:fee4:3552/64 scope link
>valid_lft forever preferred_lft forever
>
> 
>
> What I have noticed is that dhcpd.leases file does not have an entry with
> an "fixed address" entry like the following:
>
> host hpc3-gpu-16-03 {
>   dynamic;
>   hardware ethernet 20:67:7c:10:ba:86;
>   uid 20:67:7c:10:ba:86;
>   fixed-address 10.240.58.61;
> supersede server.ddns-hostname = "hpc3-gpu-16-03";
> supersede host-name = "hpc3-gpu-16-03";
> if option user-class-identifier = "xNBA" and option
> client-architecture
>  = 00:00 {
>   supersede server.filename =
>   "http://
> ${next-server}:80/tftpboot/xcat/xnba/nodes/hpc3-gpu-16-03";
> } elsif option client-architecture = 00:00 {
>   supersede server.filename = "xcat/xnba.kpxe";
> } else {
>   supersede server.filename = "";
> }
> }
>
> I am assuming the lease file got messed up somehow.  What are your
> thoughts on reconstructing the file (programmatically) and using a modified
> file?  Or is there another way from within xcat to add entries in the dhcpd
> leases file?
>
> thanks.
>
>
>
>
> On Wed, Sep 1, 2021 at 9:20 AM Russell Jones  wrote:
>
>> Is the mac correct for the node?
>>
>> On Wed, Sep 1, 2021 at 11:02 AM Imam Toufique 
>> wrote:
>>
>>> Hi,
>>>
>>> Need your helpful thoughts here with a problem we have, please.
>>>
>>> We have nodes that were provisioned with xcat, they are running, OS is
>>> working and installed.  The boot order is set to PXE first, SSD 2nd.
>>>
>>> Several days ago, when I rebooted one of the nodes, it went straight to
>>> PXE discovery mode - attempting for an install.  This is a node that is
>>> built, it should have exited the PXE boot mode and boot off the disk, but
>>> it never did.
>>>
>>> I am not sure what's going on, it looks like xcat has lost the status of
>>> the node, whether it is installed or not ( need provisioning?)
>>>
>>> Here is the 'lsdef' output of the node:
>>>
>>> ```
>>> [root@mn] lsdef -t node hpc3-14-03
>>> Object name: hpc3-14-03
>>> arch=x86_64
>>> cpucount=40
>>> cputype=Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
>>> currchain=boot
>>> currstate=boot
>>> disksize=sda:224GB,sdb:224GB
>>> groups=centos78
>>> ip=10.240.58.16
>>> mac=08:f1:ea:e4:35:52
>>> memory=193122MB
>>> mtm=HPE:ProLiant XL170r Gen10
>>> netboot=xnba
>>> nichostnamesuffixes.ib0=-ib0
>>> nichostnamesuffixes.ipmi=-ipmi
>>> nicips.ib0=10.240.60.16
>>> nicips.ipmi=10.240.62.16
>>> os=centos7.7
>>> postbootscripts=otherpkgs,hpc3-postscripts/hpc3postbootscript
>>>
>>> postscripts=syslog,remoteshell,syncfiles,setupntp,hpc3-postscripts/hpc3postscript.1,confignetwork
>>> -s
>>> profile=compute
>>> provmethod=centos7.8-x86_64-install-compute
>>> serial=2M294204L9
>>> status=booted
>>> statustime=06-21-2021 16:41:45
>>>

Re: [xcat-user] [External] Re: xCAT 2.16.2 new xNBA issue

2022-10-25 Thread Russell Jones

Hi all,

This is a pretty large thread so I may have missed it - is there an open 
bug ticket / issue that is tracking boot problems with the new xnba?


I also ended up here today because of upgrading xcat, and then no longer 
being able to boot a few dozen nodes we have with ASUS mainboards. 
Downgrading xnba fixed the issue.



On 10/11/2021 11:02 AM, THomas HUMMEL wrote:



On 10/11/21 16:10, Jarrod Johnson wrote:
1a) Correct. 


Thanks Jarrod.

--
Thomas

At the time we first did it, iPXE codebase and/or uefi didn't do well 
at 'exiting', so offering nothing was the choice, while pxe is slower, 
it means we didn't have to contend with UEFI crashes that happened in 
the wake of iPXE trying to exit. Things might have changed by now 
between UEFI maturing and iPXE maturing, but even with kkpxe in legacy 
we occasionally have BIOSes where exit to short out the netboot 
attempt causes headaches. Considered a little less critical as UEFI 
OSes tend to rewrite the bootorder on deploy to make themselves first, 
and we have the ability to explicitly request network boot through a 
BMC by and large, so the fastest mechanism for diskful boot is to just 
boot straight to hard drive and only network boot when there's a 
deployment to do. Incidentally, some security teams have started 
requiring this, as they don't like the concept of a PXE attempt every 
boot.


-Original Message-
From: THomas HUMMEL 
Sent: Monday, October 11, 2021 9:37 AM
To: xcat-user@lists.sourceforge.net
Subject: [External] Re: [xcat-user] xCAT 2.16.2 new xNBA issue

Hello Mark,

So if I sum up again, what was left unclear to me was:

a) is it expected that after the initial install, when a UEFI 
stateful node reboots in PXE first (if order gets manually changed 
again for instance), no tftp of xNBA occurs ? : like xCAT would check 
the install status of the node ?
I've always thought the way stateful once installed booted on disk 
(if PXE first) was to download the dummy 'exit' only iPXE script via 
xNBA (which would imply to see tftp of xNBA beforehand), not by 
giving up on no boot filename answer


b) why isn't the .uefi /tftpboot file
(/tftpboot/xcat/xnba/nodes/.uefi) changed in sync with the 
no-extension one after install ?


c) for duplicates messages : I still don't know why those messages 
show in logs.


Thanks for your help

--
Thomas HUMMEL

On 10/1/21 18:19, THomas HUMMEL wrote:

[Sorry Mark for the duplicate answer - I mistakenly reply to you only]

On 10/1/21 17:26, Mark Gurevich wrote:

Thomas,

Sorry, I do not quite follow your point about "2)", is the observed
behavior different from what you expected ?


Well yes : I would expect (expect may not be the correct word - let's
say I thought it worked that way) that a node, even once installed (in
the stateful case), which reboots on the network would tftp xNBA
(which in turn would GET one of the /tftpboot/xcat/xnba/node/ or
.uefi script).

My understanding is that it works this way for legacy (BIOS) boot.
Otherwise what would be the point to change the iPXE script file to:

#!gpxe
#boot
exit

?

So I was assuming any host PXE-ing would always get xNBA wether it is
on initial install or a later reboot (if UEFI network is first in boot
order)




As to "duplicate lease" point, is it possible you have an IP
associated with a MAC, that is also inside dynamic range ?


I thought about it but I don't thinks it's the case.


Can you show:

lsdef -t network -l



"maestro_net","192.168.144.0","255.255.240.0","eth0","192.168.144.1",,
"""192.168.144.2-192.168.147.254",,"maestro.pasteu
r.fr","1500",,

"maestro_ipmi","10.7.96.0","255.255.248.0",,"10.7.96.1","1
500",,

"maestro_ipoib","172.16.0.0","255.255.248.0",,,"1500",,


lsdef  -i ip,mac


Object name: maestro-satvmtmpl
      ip=192.168.148.204
      mac=00:50:56:b5:ac:de


makedhcp -q node


# makedhcp -q maestro-satvmtmpl
maestro-satvmtmpl: ip-address = 192.168.148.204, hardware-address =
00:50:56:b5:ac:de

Thanks for your help




___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://urldefense.com/v3/__https://apc01.safelinks.protection.outlook.com/?url=https*3A*2F*2Flists.sourceforge.net*2Flists*2Flistinfo*2Fxcat-user&data=04*7C01*7Cjjohnson2*40lenovo.com*7Cf12a7ae143734d28f70108d98cbc4565*7C5c7d0b28bdf8410caa934df372b16203*7C1*7C0*7C637695562527776411*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&sdata=1uZ2Ak81i5pksIjigqV0Gg1VOkBskHdpT2a3EBRsoaA*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJQ!!JFdNOqOXpB6UZW0!4bDde4DssxNHVEBFgB1rz4zdX5Wjy-HZvUTtlKKSCTurttITjS0HlM-u-3eXwSx4Rngrcw$ 




___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/xcat-user__;!!JFdNOqOXpB6UZW0!4bDde4DssxNHVEBFgB1rz4zdX5Wjy-HZvUTtlKKSCTurttITjS0HlM-u-3eXwSzE8aEChQ$ 






___

Re: [xcat-user] [External] Re: xCAT 2.16.2 new xNBA issue

2022-10-25 Thread Russell Jones

Hi all,

This is a pretty large thread so I may have missed it - is there an open 
bug ticket / issue that is tracking boot problems with the new xnba?


I also ended up here today because of upgrading xcat, and then no longer 
being able to boot a few dozen nodes we have with ASUS mainboards. 
Downgrading xnba fixed the issue.



On 10/11/2021 11:02 AM, THomas HUMMEL wrote:



On 10/11/21 16:10, Jarrod Johnson wrote:
1a) Correct. 


Thanks Jarrod.

--
Thomas

At the time we first did it, iPXE codebase and/or uefi didn't do well 
at 'exiting', so offering nothing was the choice, while pxe is slower, 
it means we didn't have to contend with UEFI crashes that happened in 
the wake of iPXE trying to exit. Things might have changed by now 
between UEFI maturing and iPXE maturing, but even with kkpxe in legacy 
we occasionally have BIOSes where exit to short out the netboot 
attempt causes headaches. Considered a little less critical as UEFI 
OSes tend to rewrite the bootorder on deploy to make themselves first, 
and we have the ability to explicitly request network boot through a 
BMC by and large, so the fastest mechanism for diskful boot is to just 
boot straight to hard drive and only network boot when there's a 
deployment to do. Incidentally, some security teams have started 
requiring this, as they don't like the concept of a PXE attempt every 
boot.


-Original Message-
From: THomas HUMMEL 
Sent: Monday, October 11, 2021 9:37 AM
To: xcat-user@lists.sourceforge.net
Subject: [External] Re: [xcat-user] xCAT 2.16.2 new xNBA issue

Hello Mark,

So if I sum up again, what was left unclear to me was:

a) is it expected that after the initial install, when a UEFI 
stateful node reboots in PXE first (if order gets manually changed 
again for instance), no tftp of xNBA occurs ? : like xCAT would check 
the install status of the node ?
I've always thought the way stateful once installed booted on disk 
(if PXE first) was to download the dummy 'exit' only iPXE script via 
xNBA (which would imply to see tftp of xNBA beforehand), not by 
giving up on no boot filename answer


b) why isn't the .uefi /tftpboot file
(/tftpboot/xcat/xnba/nodes/.uefi) changed in sync with the 
no-extension one after install ?


c) for duplicates messages : I still don't know why those messages 
show in logs.


Thanks for your help

--
Thomas HUMMEL

On 10/1/21 18:19, THomas HUMMEL wrote:

[Sorry Mark for the duplicate answer - I mistakenly reply to you only]

On 10/1/21 17:26, Mark Gurevich wrote:

Thomas,

Sorry, I do not quite follow your point about "2)", is the observed
behavior different from what you expected ?


Well yes : I would expect (expect may not be the correct word - let's
say I thought it worked that way) that a node, even once installed (in
the stateful case), which reboots on the network would tftp xNBA
(which in turn would GET one of the /tftpboot/xcat/xnba/node/ or
.uefi script).

My understanding is that it works this way for legacy (BIOS) boot.
Otherwise what would be the point to change the iPXE script file to:

#!gpxe
#boot
exit

?

So I was assuming any host PXE-ing would always get xNBA wether it is
on initial install or a later reboot (if UEFI network is first in boot
order)




As to "duplicate lease" point, is it possible you have an IP
associated with a MAC, that is also inside dynamic range ?


I thought about it but I don't thinks it's the case.


Can you show:

lsdef -t network -l



"maestro_net","192.168.144.0","255.255.240.0","eth0","192.168.144.1",,
"""192.168.144.2-192.168.147.254",,"maestro.pasteu
r.fr","1500",,

"maestro_ipmi","10.7.96.0","255.255.248.0",,"10.7.96.1","1
500",,

"maestro_ipoib","172.16.0.0","255.255.248.0",,,"1500",,


lsdef  -i ip,mac


Object name: maestro-satvmtmpl
      ip=192.168.148.204
      mac=00:50:56:b5:ac:de


makedhcp -q node


# makedhcp -q maestro-satvmtmpl
maestro-satvmtmpl: ip-address = 192.168.148.204, hardware-address =
00:50:56:b5:ac:de

Thanks for your help




___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://urldefense.com/v3/__https://apc01.safelinks.protection.outlook.com/?url=https*3A*2F*2Flists.sourceforge.net*2Flists*2Flistinfo*2Fxcat-user&data=04*7C01*7Cjjohnson2*40lenovo.com*7Cf12a7ae143734d28f70108d98cbc4565*7C5c7d0b28bdf8410caa934df372b16203*7C1*7C0*7C637695562527776411*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&sdata=1uZ2Ak81i5pksIjigqV0Gg1VOkBskHdpT2a3EBRsoaA*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJQ!!JFdNOqOXpB6UZW0!4bDde4DssxNHVEBFgB1rz4zdX5Wjy-HZvUTtlKKSCTurttITjS0HlM-u-3eXwSx4Rngrcw$ 




___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/xcat-user__;!!JFdNOqOXpB6UZW0!4bDde4DssxNHVEBFgB1rz4zdX5Wjy-HZvUTtlKKSCTurttITjS0HlM-u-3eXwSzE8aEChQ$ 






___

Re: [xcat-user] xCAT 2.16.2 new xNBA issue

2022-10-25 Thread Russell Jones

Thanks!

Yes, that is the version it appears it comes with after upgrading xCAT 
to 2.16.4, from 2.16.2.


It boots a fleet of Dell nodes fine, but these ASUS nodes stopped being 
able to boot after the upgrade, with iPXE failing at "No configuration 
methods succeeded" after a successful DHCP request.


Downgrading xNBA to the version linked earlier in this thread was an 
immediate fix.




On 10/25/22 13:36, Mark Gurevich wrote:

Have you tried the new xnba-undi version 1.21.1-1 added by 
https://github.com/xcat2/xcat-dep/pull/47 ?

-Original Message-----
From: Russell Jones 
Sent: Tuesday, October 25, 2022 1:25 PM
To: xcat-user@lists.sourceforge.net
Subject: Re: [xcat-user] [External] Re: xCAT 2.16.2 new xNBA issue

Hi all,

This is a pretty large thread so I may have missed it - is there an open bug 
ticket / issue that is tracking boot problems with the new xnba?

I also ended up here today because of upgrading xcat, and then no longer being 
able to boot a few dozen nodes we have with ASUS mainboards.
Downgrading xnba fixed the issue.


On 10/11/2021 11:02 AM, THomas HUMMEL wrote:


On 10/11/21 16:10, Jarrod Johnson wrote:

1a) Correct.

Thanks Jarrod.

--
Thomas

At the time we first did it, iPXE codebase and/or uefi didn't do well
at 'exiting', so offering nothing was the choice, while pxe is slower,
it means we didn't have to contend with UEFI crashes that happened in
the wake of iPXE trying to exit. Things might have changed by now
between UEFI maturing and iPXE maturing, but even with kkpxe in legacy
we occasionally have BIOSes where exit to short out the netboot
attempt causes headaches. Considered a little less critical as UEFI
OSes tend to rewrite the bootorder on deploy to make themselves first,
and we have the ability to explicitly request network boot through a
BMC by and large, so the fastest mechanism for diskful boot is to just
boot straight to hard drive and only network boot when there's a
deployment to do. Incidentally, some security teams have started
requiring this, as they don't like the concept of a PXE attempt every
boot.

-Original Message-
From: THomas HUMMEL 
Sent: Monday, October 11, 2021 9:37 AM
To: xcat-user@lists.sourceforge.net
Subject: [External] Re: [xcat-user] xCAT 2.16.2 new xNBA issue

Hello Mark,

So if I sum up again, what was left unclear to me was:

a) is it expected that after the initial install, when a UEFI
stateful node reboots in PXE first (if order gets manually changed
again for instance), no tftp of xNBA occurs ? : like xCAT would check
the install status of the node ?
I've always thought the way stateful once installed booted on disk
(if PXE first) was to download the dummy 'exit' only iPXE script via
xNBA (which would imply to see tftp of xNBA beforehand), not by
giving up on no boot filename answer

b) why isn't the .uefi /tftpboot file
(/tftpboot/xcat/xnba/nodes/.uefi) changed in sync with the
no-extension one after install ?

c) for duplicates messages : I still don't know why those messages
show in logs.

Thanks for your help

--
Thomas HUMMEL

On 10/1/21 18:19, THomas HUMMEL wrote:

[Sorry Mark for the duplicate answer - I mistakenly reply to you only]

On 10/1/21 17:26, Mark Gurevich wrote:

Thomas,

Sorry, I do not quite follow your point about "2)", is the observed
behavior different from what you expected ?

Well yes : I would expect (expect may not be the correct word - let's
say I thought it worked that way) that a node, even once installed (in
the stateful case), which reboots on the network would tftp xNBA
(which in turn would GET one of the /tftpboot/xcat/xnba/node/ or
.uefi script).

My understanding is that it works this way for legacy (BIOS) boot.
Otherwise what would be the point to change the iPXE script file to:

#!gpxe
#boot
exit

?

So I was assuming any host PXE-ing would always get xNBA wether it is
on initial install or a later reboot (if UEFI network is first in boot
order)



As to "duplicate lease" point, is it possible you have an IP
associated with a MAC, that is also inside dynamic range ?

I thought about it but I don't thinks it's the case.


Can you show:

lsdef -t network -l


"maestro_net","192.168.144.0","255.255.240.0","eth0","192.168.144.1",,
"""192.168.144.2-192.168.147.254",,"maestro.pasteu
r.fr","1500",,

"maestro_ipmi","10.7.96.0","255.255.248.0",,"10.7.96.1","1
500",,

"maestro_ipoib","172.16.0.0","255.255.248.0",,,"1500",,


lsdef  -i ip,mac

Object name: maestro-satvmtmpl
       ip=192.168.148.204
       mac=00:50:56:b5:ac:de


makedhcp -q node

# makedhcp -q maestro-satvmtmpl
maestro-satvmtmpl: ip-address = 192.168.148.204, hardware-address =
00:50:56:b5:ac:de

Thanks for your he

Re: [xcat-user] postinstall scripts for diskfull installations

2023-02-07 Thread Russell Jones



Take a look at the "updatenode" command.

https://xcat-docs.readthedocs.io/en/latest/guides/admin-guides/manage_clusters/ppc64le/updatenode.html

On 2023-02-03 10:10, SOPORTE MODEMAT via xCAT-user wrote:


Hi guys.

Somebody of you know how to make that xcat execute a postinstallation 
script from the management node after a complete node installation. 
Since the postscripts and postbootscripts are executing from the node 
to be installed. I would add some additional configs that should be 
executed from the management node.


Xcat version: 2.16.4

Diskfull installation using centos 8.5

Thank you in advance for your help.

Note: I was reading that there are "postinstall" attribute but it is 
only applicable to diskless installation, source: 
https://xcat-docs.readthedocs.io/en/stable/guides/admin-guides/manage_clusters/common/deployment/prepostscripts/postinstall_script.html 
[1]


Kind regards.

___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user




Links:
--
[1] 
https://xcat-docs.readthedocs.io/en/stable/guides/admin-guides/manage_clusters/common/deployment/prepostscripts/postinstall_script.html___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] xNBA won't download kernel

2023-02-07 Thread Russell Jones



Thanks - Unfortunately I can't, because the newer xNBA breaks booting a 
fleet of ASUS nodes.


Through trial and error I did find out that if I switch these nodes to 
UEFI, they'll boot. So going down that path now...


On 2023-02-07 07:49, Nathan A Besaw via xCAT-user wrote:

Based on the screenshot, it looks like you may not have the newest 
version of xNBA.
You might want to upgrade xNBA to the newest version in case this is a 
known issue that was resolved in the newer iPXE code that xNBA is based 
on.


You can get the newest version from here:
https://xcat.org/files/xcat/repos/yum/latest/xcat-dep/xnba-undi-1.21.1-1.noarch.rpm

-----

From: Russell Jones 
Sent: Monday, February 6, 2023 4:33 PM
To: xcat-user@lists.sourceforge.net 
Subject: [EXTERNAL] [xcat-user] xNBA won't download kernel

Hi all, I am working on trying to get some nodes that have Broadcom 10g 
cards to boot from these cards, instead of the 1g interface. It 
successfully gets its IP address and begins booting, but just stops 
anywhere between 3% and 16% downloading

ZjQcmQRYFpfptBannerStart

This Message Is From an Untrusted Sender
You have not previously corresponded with this sender.

ZjQcmQRYFpfptBannerEnd

Hi all,

I am working on trying to get some nodes that have Broadcom 10g cards 
to boot from these cards, instead of the 1g interface. It successfully 
gets its IP address and begins booting, but just stops anywhere between 
3% and 16% downloading the kernel, and no longer responds to pings 
after that. It maintains link to the switch, just no pings or seemingly 
attempts to continue the download any further.


I am not seeing any misconfiguration. Screenshot attached of the 
console.


Anyone see this behavior before?

Thanks for the help!
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] rsync on https://xcat.org/files/xcat/repos

2023-05-23 Thread Russell Jones
How large is the xcat repository? If there's any concern about opening up
rsync causing a bandwidth or availability issue,  maybe the community could
come together to host a few public mirrors with rsync access, and I'd be
happy to be one.

I run "official" public mirrors for epel, fedora, rocky, and CentOS stream.
I have the infrastructure to donate :-)

On Tue, May 23, 2023, 6:36 PM Vinícius Ferrão via xCAT-user <
xcat-user@lists.sourceforge.net> wrote:

> Hi Kilian,
>
> reposync is great, mainly with Red Hat, since it’s the only way to mirror
> Red Hat distributions due to it’s “subscription manager” nature. I already
> use it for RHEL, but it’s not that great when you want to mirror multiple
> repositories and multiple versions. Also with EL8 there’s the would
> —download-metadata issue.
>
> We do use it when it’s a stateful cluster with a single arch and single
> distribution, but my objetive here is to have a mirror, not a reposync one.
>
> Also as I said on the last message, the storage system has standard
> storage tools, like rsync. Reposync is not available.
>
> Regards.
>
> > On 23 May 2023, at 20:06, Kilian Cavalotti <
> kilian.cavalotti.w...@gmail.com> wrote:
> >
> > Hi all,
> >
> > We routinely use reposync [1] to mirror the xcat-core and xcat-dep
> > repositories, without rsync, and without having to re-download already
> > downloaded packages.
> > [2] has pointers on how to use it.
> >
> > Hope this helps!
> >
> > [1]: https://linux.die.net/man/1/reposync
> > [2]: https://access.redhat.com/solutions/23016
> >
> > Cheers,
> > --
> > Kilian
> >
> > On Tue, May 23, 2023 at 12:51 PM Nathan A Besaw via xCAT-user
> >  wrote:
> >>
> >> Hi Vinícius,
> >>
> >> Currently rsync is not available from xcat.org, but we can consider
> enabling it.
> >>
> >> How frequently do you plan to sync your local mirror from the xcat.org
> version?
> >>
> >> For the initial sync, I think you can use wget or curl to recursively
> download everything from https://xcat.org/files/xcat/ (or whatever
> directories are relevant to you).
> >> If you are going to refresh your mirror infrequently (after every
> release?), using wget or curl may be sufficient. If you want to resync more
> regularly I would prefer to use a solution that includes incremental
> copying so you don't have to redownload ever file every time.
> >>
> >> 
> >> From: Vinícius Ferrão via xCAT-user 
> >> Sent: Friday, May 19, 2023 1:02 PM
> >> To: xCAT Users Mailing list 
> >> Cc: Vinícius Ferrão 
> >> Subject: [EXTERNAL] [xcat-user] rsync on
> https://xcat.org/files/xcat/repos
> >>
> >> Hello, I would like to know if rsync is available on xCAT repository. I
> want to mirror it locally.
> >>
> >> I know that I can download the entire tarball from a given version, but
> I would like to use rsync to keep it updated.
> >>
> >> Thank you.
> >>
> >>
> >>
> >> ___
> >> xCAT-user mailing list
> >> xCAT-user@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/xcat-user
> >> ___
> >> xCAT-user mailing list
> >> xCAT-user@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/xcat-user
> >
> >
> >
> > --
> > Kilian
> >
> >
> > ___
> > xCAT-user mailing list
> > xCAT-user@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Config IB for redhat 9.1

2023-08-18 Thread Russell Jones
What's the error you are getting?

On Fri, Aug 18, 2023 at 10:14 AM Tomer Shachaf 
wrote:

> Hi all ,
>
>
>
> I am provisiong redhat 9.1 with no success to automate the configure of IB
> anyone done that on redhat 9.1 and may help me ?
>
>
>
> Sent from Mail  for
> Windows
>
>
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] [External] Re: Announcement: xCAT Project End-Of-Life planned for December 1, 2023

2023-09-30 Thread Russell Jones
I'll be at SC, also interested in joining in person or virtually.


On Sat, Sep 30, 2023, 4:16 PM Imam Toufique  wrote:

> Hi Don,
>
> I am very interested joining virtually via zoom.  I would love to see xcat
> move forward very much.  Please count me in .
>
> Thanks
>
> On Sat, Sep 30, 2023 at 1:28 PM Don Avart  wrote:
>
>> Kurt,
>> Glad to hear you are interested in getting together at SC.  Hopefully
>> there will be others interested in joining, if not in person at least
>> virtually through a zoom.  With respect to continued support for xCAT I’ve
>> had numerous conversations about the possibility of keeping xCAT alive and
>> what it would take to move it forward.  These conversations have included
>> discussions with Nathan Besaw at IBM, Jarrod Johnson of Lenovo, Marcus
>> Hilger of Megware, Vinicius Ferrao as well as vendors including Dell.
>>
>> For our part, RedLine, is a systems integrator and like most of you all
>> on this mailing group we have a long history with xCAT mostly as users but
>> also occasional contributors.  I too would like to see xCAT not only
>> continue to exist, but to also move forward.  The IBM team had
>> dedicated/paid staff working on the xCAT project.  I believe it will take
>> dedicated developers to move xCAT forward.  For our part, we have been
>> looking to vendors and other entities to gauge their interest in funding
>> continued support and development.  The model that SchedMD employed to
>> support development and in particular fund additional features for Slurm
>> seems to me like a model that could be emulated.  RedLine is interested in
>> either leading or being a significant part of the team that moves xCAT
>> forward.  We would like to see the xCAT community continue to exist,
>> supporting an open source product that is vendor agnostic.
>>
>> One idea that has come from separate conversations with Nathan Besaw,
>> Jarrod Johnson and Marcus Holger would be to continue support for xCAT2
>> primarily by the community.  This would be things like keeping the code
>> base up date and working with various operating systems, e.g. SLES and
>> Ubuntu etc.  Parallel to that activity is considering Confluent as the next
>> evolution of xCAT, or xCAT3.  In working with Jarrod to get a basic
>> Confluent setup in our lab, I can offer the following observations:
>> 1.  It has a similar feel as xCAT albeit with different commands.  This
>> should be no surprise given that Jarrod worked on xCAT for many years and
>> supported the community even after starting Confluent.
>> 2.  It is more modular and less monolithic than xCAT.  These are both
>> good and bad in my opinion.  For instance, I’m used to the makehosts,
>> makedns, makedhcp process that xCAT uses the manage those requirements.
>> DNS and DHCP are not assumed or even 100% required with Confluent so it’s
>> not really built-in.  This could be fantastic or this could be a real pain
>> if you’re not a DNS/DHCP veteran and it’s a requirement for your setup.
>> 3.  When I first started with xCAT I found the Sumavi tutorials written
>> be Vallard Benincosa.  It walked me through the most important features of
>> xCAT by taking me through an example installation.  I think that Confluent
>> could use an example based installation guide written from the perspective
>> of a Sys Admin.
>> 4.  Documentation in general is sparse and from my perspective needs
>> considerable work.
>> 5.  Appears to have been written with an eye to security, more so than
>> xCAT is today.
>> 6.  More modern options with respect to booting, pxe and http; RedFish
>> integration for hardware management and monitoring are examples.
>> 7.  Written in Python vs. Perl, which seems to be really just a more
>> popular language these day.
>>
>> These are just some of my observations and I’m sure Jarrod can correct or
>> comment on my thoughts.  I’ve discussed these ideas with both Nathan Besaw
>> at IBM and Jarrod’s management team at Lenovo.  Neither has committed to
>> this path but both are considering it.
>>
>> Hopefully this will spur more dialogue and more ideas.
>>
>> -Don
>> ——
>> Don Avart
>> CTO
>> RedLine Performance Solutions, LLC
>> (703) 634-5686
>> dav...@redlineperf.com
>>
>> On Sep 30, 2023, at 3:08 PM, Kurt H Maier via xCAT-user <
>> xcat-user@lists.sourceforge.net> wrote:
>>
>> On Thu, Sep 28, 2023 at 05:56:26PM -0400, Don Avart wrote:
>>
>> All,
>> Would there be interest in an unofficial “birds of a feather” type
>> meeting for xCAT at SC23 to discuss the future of xCAT?  I may be able to
>> line up a conference room for folks attending to get together.  If there’s
>> interest I assume we can also include a Zoom or Teams conference for those
>> unable to attend.
>>
>>
>> I'm interested in participating.  If no formal organization coalesces to
>> adopt it, I'll probably wind up personally forking and maintaining the
>> codebase.
>>
>> khm
>>
>>
>> ___
>> xCAT-user mailing list
>> xCAT-user@lis

[xcat-user] "remoteshell" is hanging on multiple different images after cluster reboots

2023-11-27 Thread Russell Jones
Hi devs/users,

I am having a very weird issue where "remoteshell" is failing to run on
multiple different images/clusters after we performed datacenter
maintenance. On the compute node side I am seeing:

Mon Nov 27 15:13:08 CST 2023 [info]: xcat.deployment: trying to download
> postscripts...
> Mon Nov 27 15:13:08 CST 2023 [info]: xcat.deployment: postscripts
> downloaded successfully
> Mon Nov 27 15:13:08 CST 2023 [info]: xcat.deployment: trying to get
> mypostscript from ...
> Mon Nov 27 15:13:08 CST 2023 [info]: xcat.deployment.postbootscript:
> postbootscript start..: syslog
> Mon Nov 27 15:13:09 CST 2023 [info]: xcat.deployment.postbootscript:
> postbootscript end...:syslog return with 0
> Mon Nov 27 15:13:09 CST 2023 [info]: xcat.deployment.postbootscript:
> postbootscript start..: remoteshell


 and it just hangs here.

On the cluster manager side, I see:

Nov 27 15:23:15 xcat8 xcat[2124]: ERR The node (compute-n2) is not ready,
> ignore it.


It is saying this same error for all the nodes I have booted, across
multiple different osimages.

I am not understanding - what is it looking for? How can I correct this
hangup? I have tried restarting xcatd, as well as rebooting the xcat VM. No
changes yet.
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] [External] Re: RHEL9 support in xcat

2024-02-26 Thread Russell Jones
Glad to know that worked! We have some GH nodes coming in as well. If we 
can stick to xcat we definitely want to do that.


Hope to see these changes merged and "officially" supported soon.



On 2/24/24 1:28 AM, Gilad Berman wrote:


THX for that! Indeed works 😊 (grace-hopper node, rhel9.3)

*Gilad Berman *

HPC Architect, Lenovo EMEA

gber...@lenovo.com  +972-522554262

*From:* Markus Hilger 
*Sent:* Friday, February 23, 2024 6:43 PM
*To:* Gilad Berman ; xCAT Users Mailing list 


*Cc:* Matthew Alton 
*Subject:* AW: [xcat-user] [External] Re: RHEL9 support in xcat

Hi,

with https://github.com/xcat2/xcat-core/pull/7257 arm deployment is 
pretty much working for stateless and stateful deployment via grub2.


On a x86 management node you can even cross-build arm stateless images.

Only discovery doesn't work for arm.

Feel free to test these changes.

I would like to merge these changes as alpha state arm support. I 
think stateless and stateful deployment covers 90% of use cases already 😉


​Mit freundlichen Grüßen / Kind regards

*Markus Hilger*





HPC Engineer





MEGWARE Computer Vertrieb und Service GmbH

Tel: +49 3722 528-47





Nordstraße 19

markus.hil...@megware.com 





09247 Chemnitz-Röhrsdorf, Germany

www.megware.com 





Geschäftsführer: André Singer, Axel Auweter





Amtsgericht: Chemnitz HRB 584



*Von:*Matthew Alton via xCAT-user 
*Gesendet:* Freitag, 23. Februar 2024 12:23
*An:* Gilad Berman ; xCAT Users Mailing list 


*Cc:* Matthew Alton 
*Betreff:* Re: [xcat-user] [External] Re: RHEL9 support in xcat

Hi Gilad,

There are no immediate plans for arm. Our focus will be primarily on 
x86 whilst we are in a transitional state and clearing some of the 
backlogged issues. After we have fully established the maintenance of 
project, we will be determining what a longer-term roadmap looks like 
and how we plan to further develop xCAT.


Regards,

Matt.

*Matthew Alton MBCS | **Research & Development Lead*





*Phone:*



+44 (0)114 257 2200

*Mobile:*



+44 (0)7943 594 084

*Address:*



OCF Limited, Unit 5 Rotunda Business Centre, Thorncliffe Park, 
Chapeltown, Sheffield S35 2PG


*Website:*



www.ocf.co.uk 

LinkedIn icon Twitter 
icon 




/OCF Limited is a company registered in England and Wales.  Registered 
number 4132533, VAT number GB 780 6803 14. Registered office address: 
OCF Limited, 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, 
Sheffield, S35 2PG./


/This message is private and confidential. If you have received this 
message in error, please notify us immediately and remove it from your 
system./


*From:* Gilad Berman 
*Sent:* Friday, February 23, 2024 10:56 AM
*To:* xCAT Users Mailing list 
*Cc:* Matthew Alton 
*Subject:* RE: [xcat-user] [External] Re: RHEL9 support in xcat

Any chance arm support is also being looked at?

*Gilad Berman *

HPC Architect, Lenovo EMEA

gber...@lenovo.com  +972-522554262

*From:* Matthew Alton via xCAT-user 
*Sent:* Friday, February 23, 2024 11:31 AM
*To:* xCAT Users Mailing list 
*Cc:* Matthew Alton 
*Subject:* Re: [xcat-user] [External] Re: RHEL9 support in xcat

Good morning mailing list,

We, xCAT consortium, are working on various activities in the 
background to be able to maintain xCAT fully. We have introduced a new 
milestone of xCAT 2.17 that will be our first maintenance release. The 
main objective of this release is to move RHEL9 and directly derived 
operating systems (Rocky, Alma, Oracle) into officially supported by xCAT.


RHEL9 support is being worked on under 
https://github.com/xcat2/xcat-core/issues/7416. RHEL9 has been 
reliably deployed from xCAT but there are a couple of known bugs with 
workarounds we plan to fix. This week we have been merging several PRs 
relating to RHEL9 and some other useful low hanging fruit sort of fixes.


A full announcement will be made when all our testing is completed and 
we are ready to release 2.17.


Regards,

Matt.

*Matthew Alton MBCS | **Research & Development Lead*





*Phone:*



+44 (0)114 257 2200

*Mobile:*



+44 (0)7943 594 084

*Address:*



OCF Limited, Unit 5 Rotunda Business Centre, Thorncliffe Park, 
Chapeltown, Sheffield S35 2PG


*Website:*



www.ocf.co.uk 

LinkedIn icon Twitter 
icon 




/OCF Limited is a company registered in England and Wales.  Registered 
number 4132533, VAT number GB 780 6803 14. Registered office

[xcat-user] ARM support status?

2024-03-22 Thread Russell Jones
Hi developers,

What is the "official" status of ARM support? With nvidia's Grace systems,
that support has become more important than ever for us.

It looks like perhaps the pull request that was made hasn't been merged yet
as best I can tell.



> *From:* Markus Hilger 


*Sent:* Friday, February 23, 2024 6:43 PM
*To:* Gilad Berman  ; xCAT Users
Mailing list 

*Cc:* Matthew Alton  
*Subject:* AW: [xcat-user] [External] Re: RHEL9 support in xcat



Hi,



with https://github.com/xcat2/xcat-core/pull/7257 arm deployment is pretty
much working for stateless and stateful deployment via grub2.

On a x86 management node you can even cross-build arm stateless images.

Only discovery doesn't work for arm.



Feel free to test these changes.

I would like to merge these changes as alpha state arm support. I think
stateless and stateful deployment covers 90% of use cases already 😉
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Reg xCAT

2011-07-21 Thread Russell Jones


  
  
Here you go:

http://xcat.sourceforge.net/docs.html



On 7/21/2011 7:55 AM, SYED ASIF ZAHEER wrote:

  
  
Hi,
 
Iam Syed Asif Zaheer, I
  want to install the xCAT2 in vmware workstation on linux for
  practise.
 
Pls suggest me the
  documentation for xCAT and 
 
also
for xCAT for administration and documentation also
on  clusters nodes.
  

Thanks in advance.

Regards
Syed Asif Zaheer
  sa_zah...@hotmail.com

 

 

   

  
  

--
5 Ways to Improve & Secure Unified Communications
Unified Communications promises greater efficiencies for business. UC can 
improve internal communications as well as offer faster, more efficient ways
to interact with customers and streamline customer service. Learn more!
http://www.accelacomm.com/jaw/sfnl/114/51426253/
  

___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


  


--
5 Ways to Improve & Secure Unified Communications
Unified Communications promises greater efficiencies for business. UC can 
improve internal communications as well as offer faster, more efficient ways
to interact with customers and streamline customer service. Learn more!
http://www.accelacomm.com/jaw/sfnl/114/51426253/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] service nodes node working correctly

2011-07-21 Thread Russell Jones
This means that your xCAT management node is not syncing the status 
updates to the service nodes. What happens if you do a "nodeset node001 
install" on the management node? Do both service nodes report back to 
you that they have done the command correctly?

You should not be needing to manually sync anything from the management 
node to the service nodes, this is done on deployment. If you are having 
to do this, something is broken and will continue to stay broken unless 
you correct the root cause. I would recommend redeploying both of your 
service nodes (nodeset service install; rpower service reset) and seeing 
if after a clean redeployment this corrects the issue.

Could it be a matter of DNS is supposed to be configured on the
SNs (I have servicenode.nameserver=1) and it's not configured despite the
fact that the site.nameservers shows the IP for the xCAT MN (I edited that
out of my earlier email).



Service node's DNS are deployed as caching nodes, so their DNS cache 
will only contain nodes they have previously deployed. If they do not 
have the DNS info they need when it is requested, they look to the 
management node for this information and then cache it so that they 
don't have to request it again (I believe it is networks.nameservers 
that sets the IP they look to. Someone can correct me if this is wrong.)

[Thu Jul 21 13:40:58 2011] conserver (19859): ERROR: no consoles found in
configuration file


Are you using console servers? Is servicenode.conserver set to "1"?



Should site.nameservers have the SN IPs in there as well for DNS to work?

I want to say no, but I am not entirely sure what this setting actually 
changes. I've put a bad IP in it and makehosts/makedns/makedhcp -n'd it and it 
doesn't seem to be actually changing anything. Perhaps someone else can chime 
in here on what that touches, as all of my compute node's /etc/resolv.conf 
contains the entries from networks.nameservers, as does /etc/named.conf's 
forwarders section on my service nodes.






What about site.dhcpinterfaces? Will adding the SNs in there cause makedhcp to 
build working DHCP configs on the SNs?

site.dhcpinterfaces defines what interfaces are facing the cluster network and 
DHCP should respond to. This is available in the man page "man site".

dhcpinterfaces:  The network interfaces DHCP should listen on.  If it is the 
same
for all nodes, use simple comma-separated 
list of NICs.  To
specify different NICs for different nodes:
 mnâeth1,eth2;serviceâbond0.

An example of my 
dhcpinterfaces:"dhcpinterfaces","mn|eth1;service01|eth0;service02|eth0",,






On 7/21/2011 4:31 PM, Christian Caruthers wrote:
> A little more information re: the last email. I looked
> in /tftpboot/pxelinux.cfg/node001 and found that the xCAT MN still points
> to the kickstart install while both SNs have been updated to boot the node
> from HDD.
>
> Christian D. Caruthers
> Linux HPC Consultant
> STG Lab Services
> 757-656-9675
>
>
>
>From:   Christian Caruthers/Richmond/IBM@IBMUS
>
>To: xCAT Users Mailing list
>
>Date:   07/21/2011 04:53 PM
>
>Subject:Re: [xcat-user] service nodes node working correctly
>
>
>
>
>
>
> Still more information. I rsync'ed the /etc/xcat/hostkeys and /etc/xcat/ca
> directories from the MN to the SNs and retried an install. After
> installation, while the node is rebooting, I did a nodeset node001 stat and
> saw something weird:
>
> nodeset node001 stat
> node001: install rhels6.1-x86_64-gpu
> node001: boot
> node001: boot
>
> So then I looked at the chain table...
>
> [root@mgt gpfs]# nodels node001 chain
> node001: chain.ondiscover: nodediscover
> node001: chain.chain: runcmd=bmcsetup,standby
> node001: chain.node: node001
> node001: chain.currstate: boot
> node001: chain.currchain: boot
> node001: chain.comments:
> node001: chain.disable:
> [root@mgt gpfs]# psh service nodels node001 chain
> sn02: node001: chain.ondiscover: nodediscover
> sn02: node001: chain.chain: runcmd=bmcsetup,standby
> sn02: node001: chain.node: node001
> sn02: node001: chain.currstate: boot
> sn02: node001: chain.currchain: boot
> sn02: node001: chain.comments:
> sn02: node001: chain.disable:
> sn01: node001: chain.ondiscover: nodediscover
> sn01: node001: chain.chain: runcmd=bmcsetup,standby
> sn01: node001: chain.node: node001sn01: node001: chain.currstate: boot
> sn01: node001: chain.currchain: boot
> sn01: node001: chain.comments:
> sn01: node001: chain.disable:
>
> Thoughts?
>
> Christian D. Caruthers
> Linux HPC Consultant
> STG Lab Services
> 757-656-9675
>
>
>
>From:   Christian Caruthers/Richmond/IBM@IBMUS
>
>
>To: xCAT Users Mailing list
>
>
>Date:   07/21/2011 02:46 PM
>
>
>Subject:Re: [xcat-user] service nodes node working correctly
>
>
>
>
>
>
>
> Some more information: On the service nodes, when I run service xcatd
> restart, I se

Re: [xcat-user] CentOS 5.5 stateless stalls when netbooting

2011-07-22 Thread Russell Jones
./\$@ 2>&1 | tee -a \$logfile

That line needs an ampersand after it to resolve the problem. Example:

./\$1 2>&1 | tee -a \$logfile &







On 7/22/2011 6:57 PM, Luis Miguel Silva wrote:
> Dear all,
>
> I'm having some trouble with netbooting CentOS5.5 in my 2.6 xCAT system.
> > From the compute node's console, all i see is that it hangs when
> booting "syslogd" but after carefully inspecting i can see it hangs
> while loading "xcatdsklspost".
> More specifically, in this area:
>   if [[ -f \$1 ]]; then
>echo \"Running postscript: \$@\" | tee -a \$logfile
>./\$@ 2>&1 | tee -a \$logfile
>   else
>echo \"Postscript \$1 does NOT exist.\" | tee -a \$logfile
>   fi
>
> We have come across this before (in a 2.5 xCAT system) and a co-worker
> managed to solve it but he did not document it and now he can't figure
> out what's going on with this...
>
> Any thoughts?
>
> p.s. we basically just followed the "Basic Install DHCP" How to's
> "Advanced Features" section:
> http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Basic_Install_DHCP#Advanced_features
>
> Thanks,
> Luis
>
> --
> 10 Tips for Better Web Security
> Learn 10 ways to better secure your business today. Topics covered include:
> Web security, SSL, hacker attacks&  Denial of Service (DoS), private keys,
> security Microsoft Exchange, secure Instant Messaging, and much more.
> http://www.accelacomm.com/jaw/sfnl/114/51426210/
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>

--
Storage Efficiency Calculator
This modeling tool is based on patent-pending intellectual property that
has been used successfully in hundreds of IBM storage optimization engage-
ments, worldwide.  Store less, Store more with what you own, Move data to 
the right place. Try It Now! http://www.accelacomm.com/jaw/sfnl/114/51427378/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] CentOS 5.5 stateless stalls when netbooting

2011-07-22 Thread Russell Jones


  
  
   \_ /bin/awk -f /xcatpost/updateflag.awk 172.31.32.254 3002
installstatus booted


That's a big indication that the updateflag may be hanging. Any
output in 172.31.32.254's /var/log/messages that looks interesting?

Remember that xCAT 2.6 made big changes to DNS. Can you resolve both
the xcatmaster, the management node, and the compute node itself
when it is hung, both forward and reverse records?



On 7/22/2011 7:15 PM, Luis Miguel Silva wrote:

  Actually, my co-worker had been trying to figure what was going on
with it yesterday and the current version of it did have a & in the
end.

But i'm still having trouble with it!

Please look at the attached screenshot to see what is happening from
the console.

As for the processes, here's how things look like:
root  1894  0.0  0.0  10900  1420 ?Ss   13:57   0:00
/bin/bash /etc/rc.d/rc 3
root  2133  0.0  0.0  10896  1376 ?S13:57   0:00  \_
/bin/sh /etc/rc3.d/S61xcatpostinit start
root  2139  0.0  0.0  10768  1284 ?S13:57   0:00
\_ /bin/sh /opt/xcat/xcatdsklspost
root  2197  0.0  0.0  10768   820 ?S13:57   0:00
   \_ /bin/sh /opt/xcat/xcatdsklspost
root  2203  0.0  0.0   3788   496 ?S13:57   0:00
   \_ tee -a /var/log/xcat/xcat.log
root  2229  0.0  0.0   8764   876 ?S13:57   0:00
   \_ /bin/awk -f /xcatpost/updateflag.awk 172.31.32.254 3002
installstatus booted
root  2239  0.0  0.0   3784   596 ?S13:57   0:00
   \_ logger -t xcat
root  2361  0.0  0.0   3788   468 ?S13:59   0:00
   \_ sleep 10

Any thoughts?

Thanks,
Luis

On Fri, Jul 22, 2011 at 6:02 PM, Russell Jones  wrote:

  
./\$@ 2>&1 | tee -a \$logfile

That line needs an ampersand after it to resolve the problem. Example:

./\$1 2>&1 | tee -a \$logfile &







On 7/22/2011 6:57 PM, Luis Miguel Silva wrote:


  Dear all,

I'm having some trouble with netbooting CentOS5.5 in my 2.6 xCAT system.

  
From the compute node's console, all i see is that it hangs when

  
  booting "syslogd" but after carefully inspecting i can see it hangs
while loading "xcatdsklspost".
More specifically, in this area:
  if [[ -f \$1 ]]; then
   echo \"Running postscript: \$@\" | tee -a \$logfile
   ./\$@ 2>&1 | tee -a \$logfile
  else
   echo \"Postscript \$1 does NOT exist.\" | tee -a \$logfile
  fi

We have come across this before (in a 2.5 xCAT system) and a co-worker
managed to solve it but he did not document it and now he can't figure
out what's going on with this...

Any thoughts?

p.s. we basically just followed the "Basic Install DHCP" How to's
"Advanced Features" section:
http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Basic_Install_DHCP#Advanced_features

Thanks,
Luis

--
10 Tips for Better Web Security
Learn 10 ways to better secure your business today. Topics covered include:
Web security, SSL, hacker attacks&  Denial of Service (DoS), private keys,
security Microsoft Exchange, secure Instant Messaging, and much more.
http://www.accelacomm.com/jaw/sfnl/114/51426210/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user




--
Storage Efficiency Calculator
This modeling tool is based on patent-pending intellectual property that
has been used successfully in hundreds of IBM storage optimization engage-
ments, worldwide.  Store less, Store more with what you own, Move data to
the right place. Try It Now! http://www.accelacomm.com/jaw/sfnl/114/51427378/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user




--
Storage Efficiency Calculator
This modeling tool is based on patent-pending intellectual property that
has been used successfully in hundreds of IBM storage optimization engage-
ments, worldwide.  Store less, Store more with what you own, Move data to 
the right place. Try It Now! http://www.accelacomm.com/jaw/sfnl/114/51427378/


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

  

  


--
Storage Efficiency Calculator
This modeling tool is based on patent-pending intellectual property that
has been used successfully in hundreds 

Re: [xcat-user] CentOS 5.5 stateless stalls when netbooting

2011-07-22 Thread Russell Jones


  
  
Just to rule out strange gen/packimage issues with 2.6 (given that
you stated resolv.conf wasn't getting copied over), I regenerated my
diskless image and repacked it using xCAT 2.6 in a lab environment
and was able to netboot my CentOS 5.5 node without an issue as long
as that ampersand was at the end of that line in xcatdsklspost. 

However! I did notice the following error on bootup:






Looks like maybe a bug? $NEWROOT doesn't seem to be getting set or
extracted out like it should in that cp line. Logging into my node
yields:

[root@c1n02 ~]# cat /etc/resolv.conf
#Dummy resolv.conf to make boot cleaner


Super! 


On 7/22/2011 7:41 PM, Jonathan Dye wrote:

  so, that's yet another problem.  resolv.conf doesn't get written correctly.  i see where the initrd is supposed to be copying both the resolv.conf and the lease file from the stateless environment, but it only copies the lease file.  when i manually fixed resolv.conf in the guest, the retries from updateflag still didn't work, and a new process wouldn't work either.

reverse dns worked for the machine that it was trying to contact.  forward did too, but as you can see it is using the IP of the xcat master and not the name.

- jonathan

- Original Message -
From: "Russell Jones" 
To: "xCAT Users Mailing list" 
Sent: Friday, July 22, 2011 6:19:38 PM
Subject: Re: [xcat-user] CentOS 5.5 stateless stalls when netbooting


\_ /bin/awk -f /xcatpost/updateflag.awk 172.31.32.254 3002
installstatus booted 
That's a big indication that the updateflag may be hanging. Any output in 172.31.32.254's /var/log/messages that looks interesting? 

Remember that xCAT 2.6 made big changes to DNS. Can you resolve both the xcatmaster, the management node, and the compute node itself when it is hung, both forward and reverse records? 



On 7/22/2011 7:15 PM, Luis Miguel Silva wrote: 

Actually, my co-worker had been trying to figure what was going on
with it yesterday and the current version of it did have a & in the
end.

But i'm still having trouble with it!

Please look at the attached screenshot to see what is happening from
the console.

As for the processes, here's how things look like:
root  1894  0.0  0.0  10900  1420 ?Ss   13:57   0:00
/bin/bash /etc/rc.d/rc 3
root  2133  0.0  0.0  10896  1376 ?S13:57   0:00  \_
/bin/sh /etc/rc3.d/S61xcatpostinit start
root  2139  0.0  0.0  10768  1284 ?S13:57   0:00
\_ /bin/sh /opt/xcat/xcatdsklspost
root  2197  0.0  0.0  10768   820 ?S13:57   0:00
   \_ /bin/sh /opt/xcat/xcatdsklspost
root  2203  0.0  0.0   3788   496 ?S13:57   0:00
   \_ tee -a /var/log/xcat/xcat.log
root  2229  0.0  0.0   8764   876 ?S13:57   0:00
   \_ /bin/awk -f /xcatpost/updateflag.awk 172.31.32.254 3002
installstatus booted
root  2239  0.0  0.0   3784   596 ?S13:57   0:00
   \_ logger -t xcat
root  2361  0.0  0.0   3788   468 ?S13:59   0:00
   \_ sleep 10

Any thoughts?

Thanks,
Luis

On Fri, Jul 22, 2011 at 6:02 PM, Russell Jones  wrote: 

./\$@ 2>&1 | tee -a \$logfile

That line needs an ampersand after it to resolve the problem. Example:

./\$1 2>&1 | tee -a \$logfile &







On 7/22/2011 6:57 PM, Luis Miguel Silva wrote: 

Dear all,

I'm having some trouble with netbooting CentOS5.5 in my 2.6 xCAT system. 

>From the compute node's console, all i see is that it hangs when booting "syslogd" but after carefully inspecting i can see it hangs
while loading "xcatdsklspost".
More specifically, in this area:
  if [[ -f \$1 ]]; then
   echo \"Running postscript: \$@\" | tee -a \$logfile
   ./\$@ 2>&1 | tee -a \$logfile
  else
   echo \"Postscript \$1 does NOT exist.\" | tee -a \$logfile
  fi

We have come across this before (in a 2.5 xCAT system) and a co-worker
managed to solve it but he did not document it and now he can't figure
out what's going on with this...

Any thoughts?

p.s. we basically just followed the "Basic Install DHCP" How to's
"Advanced Features" section: http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Basic_Install_DHCP#Advanced_features Thanks,
Luis

--
10 Tips for Better Web Security
Learn 10 ways to better secure your business today. Topics covered include:
Web security, SSL, hacker attacks&  Denial of Service (DoS), private keys,
security Microsoft Exchange, secure Instant Messaging, and much more. http://www.accelacomm.com/jaw/sfnl/114/51426210/ ___
xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/

Re: [xcat-user] CentOS 5.5 stateless stalls when netbooting

2011-07-22 Thread Russell Jones


  
  
This wasn't a hang, my node boots up all the way. I was just
screenshotting that error :)



On 7/22/2011 7:59 PM, Luis Miguel Silva wrote:
FYI, ours doesn't seem to hang on "udev" :o).
  
  Thanks,
  Luis
  
  On Fri, Jul 22, 2011 at 6:56 PM, Russell
Jones <rjo...@eggycrew.com>
wrote:

   Just to rule out
strange gen/packimage issues with 2.6 (given that you stated
resolv.conf wasn't getting copied over), I regenerated my
diskless image and repacked it using xCAT 2.6 in a lab
environment and was able to netboot my CentOS 5.5 node
without an issue as long as that ampersand was at the end of
that line in xcatdsklspost. 

However! I did notice the following error on bootup:






Looks like maybe a bug? $NEWROOT doesn't seem to be getting
set or extracted out like it should in that cp line. Logging
into my node yields:

[root@c1n02 ~]# cat /etc/resolv.conf

  #Dummy resolv.conf to make boot cleaner
  
  

Super! 

   

On 7/22/2011 7:41 PM, Jonathan Dye wrote:

  so, that's yet another problem.  resolv.conf doesn't get written correctly.  i see where the initrd is supposed to be copying both the resolv.conf and the lease file from the stateless environment, but it only copies the lease file.  when i manually fixed resolv.conf in the guest, the retries from updateflag still didn't work, and a new process wouldn't work either.

reverse dns worked for the machine that it was trying to contact.  forward did too, but as you can see it is using the IP of the xcat master and not the name.

- jonathan

- Original Message -
From: "Russell Jones" 
To: "xCAT Users Mailing list" 
Sent: Friday, July 22, 2011 6:19:38 PM
Subject: Re: [xcat-user] CentOS 5.5 stateless stalls when netbooting


\_ /bin/awk -f /xcatpost/updateflag.awk 172.31.32.254 3002
installstatus booted 
That's a big indication that the updateflag may be hanging. Any output in 172.31.32.254's /var/log/messages that looks interesting? 

Remember that xCAT 2.6 made big changes to DNS. Can you resolve both the xcatmaster, the management node, and the compute node itself when it is hung, both forward and reverse records? 



On 7/22/2011 7:15 PM, Luis Miguel Silva wrote: 

Actually, my co-worker had been trying to figure what was going on
with it yesterday and the current version of it did have a & in the
end.

But i'm still having trouble with it!

Please look at the attached screenshot to see what is happening from
the console.

As for the processes, here's how things look like:
root  1894  0.0  0.0  10900  1420 ?Ss   13:57   0:00
/bin/bash /etc/rc.d/rc 3
root  2133  0.0  0.0  10896  1376 ?S13:57   0:00  \_
/bin/sh /etc/rc3.d/S61xcatpostinit start
root  2139  0.0  0.0  10768  1284 ?S13:57   0:00
\_ /bin/sh /opt/xcat/xcatdsklspost
root  2197  0.0  0.0  10768   820 ?S13:57   0:00
   \_ /bin/sh /opt/xcat/xcatdsklspost
root  2203  0.0  0.0   3788   496 ?S13:57   0:00
   \_ tee -a /var/log/xcat/xcat.log
root  2229  0.0  0.0   8764   876 ?S13:57   0:00
   \_ /bin/awk -f /xcatpost/updateflag.awk 172.31.32.254 3002
installstatus booted
root  2239  0.0  0.0   3784   596 ?S13:57   0:00
   \_ logger -t xcat
root  2361  0.0  0.0   3788   468 ?S    13:59   0:00
   \_ sleep 10

Any thoughts?

Thanks,
Luis

On Fri, Jul 22, 2011 at 6:02 PM, Russell Jones  wrote: 

./\$@ 2>&1 | tee -a \$logfile

That line needs an ampersand after it to resolve the problem. Example:

./\$1 2>&1 | tee -a \$logfile &







On 7/22/2011 6:57 PM, Luis Miguel Silva wrote: 

Dear all,

I'm having some trouble with netbooting CentOS5.5 in my 2.6 xCAT system. 

>From the compute node's console, all i see is that it hangs when booting "syslogd" but after carefully inspecting i can see it hangs
while loading "xcatdsklspost".
More specifically, in this area:
  if [[ -f \$1 ]]; then
   echo \"Running postscript: \$@\" | tee -a \$logfile
   ./\$@ 2>&1 | tee -a \$logfile
  else
   echo \"Postscript \$1 does NOT exist.\" | tee -a \$logfile
  fi

We have come across this before (in a 2.5 xCAT system) and a co-worker
managed to solve it but he did not document it and now he can't figure
out what's going on with this...

Any thoughts?

p.s. we basical

Re: [xcat-user] CentOS 5.5 stateless stalls when netbooting

2011-07-22 Thread Russell Jones


  
  
I believe I have found the bug in 2.6's initrd that causes the error
I provided the screenshot for. In the "init" file:

if [ -z "$SNAPSHOTSERVER" ]; then cp /etc/resolv.conf
\$NEWROOT/etc/resolv.conf; fi

The backslash there is escaping the dollar sign for $NEWROOT, and as
a result, well, is generating that error :)

I need to head out for a bit so I don't have time to try fixing it
and re-packing the initrd. If someone else would like to be
adventurous and give it a shot, go for it.

Also, as for the stalling issue you are having, both genimage and
packimage appear to copy the /install/postscripts/xcatdsklspost and
put it in the rootimg folder's /opt/xcat. Just to cover the basics,
you did make the ampersand change in the copy in
/install/postscripts right?




On 7/22/2011 8:08 PM, Luis Miguel Silva wrote:
Oops...I'm sorry! heheh...
  I had surgery today and i'm still a little drowsy heheh :oP
  
  Thanks!
  Luis
  
  On Fri, Jul 22, 2011 at 7:01 PM, Russell
Jones <rjo...@eggycrew.com>
wrote:

   This wasn't a hang, my
node boots up all the way. I was just screenshotting that
error :)

  



On 7/22/2011 7:59 PM, Luis Miguel Silva wrote:
FYI, ours doesn't seem to hang
  on "udev" :o).
  
  Thanks,
  Luis
  
      On Fri, Jul 22, 2011 at 6:56
PM, Russell Jones <rjo...@eggycrew.com>
wrote:

   Just to
rule out strange gen/packimage issues with 2.6
(given that you stated resolv.conf wasn't
getting copied over), I regenerated my diskless
image and repacked it using xCAT 2.6 in a lab
environment and was able to netboot my CentOS
5.5 node without an issue as long as that
ampersand was at the end of that line in
xcatdsklspost. 

However! I did notice the following error on
bootup:






Looks like maybe a bug? $NEWROOT doesn't seem to
be getting set or extracted out like it should
in that cp line. Logging into my node yields:

[root@c1n02 ~]# cat /etc/resolv.conf

  #Dummy resolv.conf to make boot cleaner
  
  

Super! 

   

On 7/22/2011 7:41 PM, Jonathan Dye wrote:

  so, that's yet another problem.  resolv.conf doesn't get written correctly.  i see where the initrd is supposed to be copying both the resolv.conf and the lease file from the stateless environment, but it only copies the lease file.  when i manually fixed resolv.conf in the guest, the retries from updateflag still didn't work, and a new process wouldn't work either.

reverse dns worked for the machine that it was trying to contact.  forward did too, but as you can see it is using the IP of the xcat master and not the name.

- jonathan

- Original Message -
From: "Russell Jones" 
To: "xCAT Users Mailing list" 
Sent: Friday, July 22, 2011 6:19:38 PM
Subject: Re: [xcat-user] CentOS 5.5 stateless stalls when netbooting


\_ /bin/awk -f /xcatpost/updateflag.awk 172.31.32.254 3002
installstatus booted 
That's a big indication that the updateflag may be hanging. Any output in 172.31.32.254's /var/log/messages that looks interesting? 

Remember that xCAT 2.6 made big changes to DNS. Can you resolve both the xcatmaster, the management node, and the compute node itself when it is hung, both forward and reverse records? 



On 7/22/2011 7:15 PM, Luis Miguel Silva wrote: 

Actually, my co-worker had been trying to figure what was going on
with it yesterday and the current version of it did have a & in the
end.

But i'm still having trouble with it!

Please look at the attached screenshot to see what is happening from
the console.

As for the processes, here's how things look like:
root  1894  0.0  0.0  10900  1420 ?S

Re: [xcat-user] CentOS 5.5 stateless stalls when netbooting

2011-07-24 Thread Russell Jones
Hi Luis,

I could be completely and 100% wrong about the purpose of this section 
of the genimage script, and I am just going off of my experiences and 
what I have seen on my end here, plus its 2:30 AM here, so someone can 
feel free to correct me :-)

The purpose of that line is to be able to print directly into the "init" 
script inside of your initrd-stateless.gz, so that the $NEWROOT 
variables *is* changed to what your system root is:

  print $inifile 'if [ -z "$SNAPSHOTSERVER" ]; then cp /etc/resolv.conf 
\$NEWROOT/etc/resolv.conf; fi'."\n";

Notice how it is single quotes around that line and not double. Removing 
the backslash from in front of $NEWROOT allows $NEWROOT to change to, in 
my case (and more than likely your case as well) /sysroot for when the 
node is booting and actually running the init script this line was 
printed into. Leaving the backslash in generates that error you see on 
startup about resolv.conf not being able to be opened because when 
genimage writes this line to the init script, it ends up including the \ 
as well. This breaks the variable. A backslash to escape a $ is not 
necessary in single quotes.

You can test this with a simple bash script:

#!/bin/bash

NEWROOT="test"

echo '\$NEWROOT';
echo '$NEWROOT';
echo "\$NEWROOT";
echo "$NEWROOT";

Output:

\$NEWROOT
$NEWROOT
$NEWROOT
test


In my particular case, changing the line as I mentioned before resolved 
my errors I was seeing on boot, and my DNS issues, which allowed my 
node's /etc/resolv.conf to contain proper DNS entries instead of the 
empty placeholder that was there before. Placing the backslash back in 
there, re-generates the error on boot and makes my resolv.conf contain 
the empty placeholder text again.

The reason why I was focusing on resolving this issue is I have seen 
strange things when DNS is not working properly, regardless of if you 
are trying to connect via IP to the xcatmaster or not. I'm afraid I've 
kind of reached the end of my ability to troubleshoot this issue without 
actually physically being on your node and understanding your network 
setup better. I know you've said you have manually edited resolv.conf to 
point to the correct server, but did you do this editing before any of 
the postscripts began to run on the node?

Someone else may have better insight into what could be your troubles 
here, and hopefully might be able to chime in.




On 7/24/2011 1:03 AM, Luis Miguel Silva wrote:
> Russel,
>
> According to my tests, the "\" was actually necessary to escape the
> variable (otherwise the $NEWROOT variable would refer to the local
> genimage script and not to the sysinit script being ran on the node).
>
> I've also manually set /etc/resolv.conf to point to my nameserver and
> the problem still happens...
>
> Like i mentioned in a previous email, we are actually connecting by ip
> so that shouldn't be a problem too...
>
> Any other thoughts?
>
> Thanks!
> Luis
>
> On Sat, Jul 23, 2011 at 10:57 PM, Luis Miguel Silva
>   wrote:
>> I did not! I just reran genimage and packimage :o)
>> I'll try that, thanks!
>>
>> Luis Miguel Silva
>> On Jul 23, 2011, at 8:31 PM, Russell Jones  wrote:
>>
>> Hi Luis,
>>
>> Sounds to me like your changes didn't take. Did you completely rm the
>> initrd-stateless file before re-genning and re-packing the image? rm -rf
>> rootimg, rm rootimg.gz, rm initrd-stateless.gz. Then re-gen, re-pack, and
>> re-nodeset the node to netboot.
>>
>> That's the only thing I can think of to tell you to do.
>>
>>
>>
>>
>> On 7/23/2011 11:52 AM, Luis Miguel Silva wrote:
>>
>> Russel,
>>
>> I tried doing that and repackaging my image but...it didn't seem to work.
>>
>> This is how my genimage looks like:
>> print $inifile 'if [ -z "$SNAPSHOTSERVER" ]; then cp /etc/resolv.conf
>> $NEWROOT/etc/resolv.conf; fi'."\n";
>>
>> Looking at the vmware console, i see the error stating "unable to open
>> $NEWROOT/etc/resolv.conf :o\ .
>>
>> Any thoughts?
>>
>> Thanks,
>> Luis
>>
>> On Sat, Jul 23, 2011 at 10:05 AM, Russell Jones  wrote:
>>> The initrd I am talking about is in
>>> /install/netboot/centos5.5/x86_64/compute. initrd-stateless.gz. It's "init"
>>> script that it runs is generated directly from the genimage that is located
>>> in /opt/xcat/share/xcat/netboot/centos. You will want to edit this line:
>>>
>>> print $inifile 'if [ -z "$SNAPSHOTSERVER" ]; then cp /etc/resolv.conf
>>> \$NEWROOT/etc/resolv.conf; fi'."\

[xcat-user] nextdestiny behavior, slow response

2011-07-25 Thread Russell Jones
Hi all,

When adding a cluster to our network, we are getting slow responses from 
what appears to be getdestiny/nextdestiny on our xCAT 2.3 cluster, and I 
was wondering if any enhancements / changes have been made in 2.6 to 
this behavior?

The steps of replicating the problem (and how we add a cluster):

* Rack and cable up, power on
* Run getmacs
* Add cluster nodes to other required tables (we have a custom script 
that does this)
* makedhcp for the cluster
* nodeset cluster runcmd=bmcsetup
* Power cycle nodes

At this point, as the nodes begin asking the management node for 
getdestiny/nextdestiny, the management node begins to become very slow 
in responding to the getdestiny/nextdestiny requests. We are only 
talking about 30 to 35 nodes at a time, and it happens when the nodes 
boot nbfs and get their commands to run bmcsetup. We also see a perl 
process consistently using 100% of a CPU core during this time. What is 
the process of how these nodes are pulling this data? Obviously it's 
getting it from the chain table, but what could cause this response to 
be so resource hungry and slow?

Also, we are looking to upgrade from 2.3 to 2.6. Has there been any 
changes to this process that could assist in correcting this behavior?


Thanks!



--
Storage Efficiency Calculator
This modeling tool is based on patent-pending intellectual property that
has been used successfully in hundreds of IBM storage optimization engage-
ments, worldwide.  Store less, Store more with what you own, Move data to 
the right place. Try It Now! http://www.accelacomm.com/jaw/sfnl/114/51427378/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] nextdestiny behavior, slow response

2011-07-25 Thread Russell Jones
Thanks Lissa!

Would you happen to know if any of these changes actually specifically 
changed the getdestiny/nextdestiny behavior?



On 7/25/2011 10:34 AM, Lissa Valletta wrote:
> Since xCAT 2.3 has not been supported for a while.   I would suggest you
> upgrade at least to 2.5,  a supported release.  There have been tremendous
> changes in xCAT from 2.3.   If you do the upgrade should be automatic, your
> database will be migrated, etc.One thing you might need to do since
> your release is so old,  is after the upgrade,  manually stop the xcatd and
> make sure all processes are cleaned up and then restart it.  Also, if you
> have service node,  make sure they are upgraded also.
>
> Lissa K. Valletta
> 2-3/T12
> Poughkeepsie, NY 12601
> (tie 293) 433-3102
>
>
>
>
>
> From: Russell Jones
> To:   xCAT Users Mailing list
> Date: 07/25/2011 11:27 AM
> Subject:  [xcat-user] nextdestiny behavior, slow response
>
>
>
> Hi all,
>
> When adding a cluster to our network, we are getting slow responses from
> what appears to be getdestiny/nextdestiny on our xCAT 2.3 cluster, and I
> was wondering if any enhancements / changes have been made in 2.6 to
> this behavior?
>
> The steps of replicating the problem (and how we add a cluster):
>
> * Rack and cable up, power on
> * Run getmacs
> * Add cluster nodes to other required tables (we have a custom script
> that does this)
> * makedhcp for the cluster
> * nodeset cluster runcmd=bmcsetup
> * Power cycle nodes
>
> At this point, as the nodes begin asking the management node for
> getdestiny/nextdestiny, the management node begins to become very slow
> in responding to the getdestiny/nextdestiny requests. We are only
> talking about 30 to 35 nodes at a time, and it happens when the nodes
> boot nbfs and get their commands to run bmcsetup. We also see a perl
> process consistently using 100% of a CPU core during this time. What is
> the process of how these nodes are pulling this data? Obviously it's
> getting it from the chain table, but what could cause this response to
> be so resource hungry and slow?
>
> Also, we are looking to upgrade from 2.3 to 2.6. Has there been any
> changes to this process that could assist in correcting this behavior?
>
>
> Thanks!
>
>
>
> --
>
> Storage Efficiency Calculator
> This modeling tool is based on patent-pending intellectual property that
> has been used successfully in hundreds of IBM storage optimization engage-
> ments, worldwide.  Store less, Store more with what you own, Move data to
> the right place. Try It Now!
> http://www.accelacomm.com/jaw/sfnl/114/51427378/
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
>
> --
> Storage Efficiency Calculator
> This modeling tool is based on patent-pending intellectual property that
> has been used successfully in hundreds of IBM storage optimization engage-
> ments, worldwide.  Store less, Store more with what you own, Move data to
> the right place. Try It Now! http://www.accelacomm.com/jaw/sfnl/114/51427378/
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>

--
Storage Efficiency Calculator
This modeling tool is based on patent-pending intellectual property that
has been used successfully in hundreds of IBM storage optimization engage-
ments, worldwide.  Store less, Store more with what you own, Move data to 
the right place. Try It Now! http://www.accelacomm.com/jaw/sfnl/114/51427378/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] nextdestiny behavior, slow response

2011-07-25 Thread Russell Jones

Thanks Jim!

Is the polling interval a variable we can manually increase ourselves 
somewhere?




On 7/25/2011 10:38 AM, Jim Turner wrote:

What you are seeing is expected behavior. When nodes get into the
"getnextdestiny" state they have open, active connections with the xcat
daemon and "check in" to see what's next to do frequently (perhaps too
frequently).

If you do a nodeset  for those nodes to "shell" it will stop the repeated
"What do I do next?" queries.

It might be possible for the polling interval to be increased but I don't
know if that's been changed in 2.6.

Jim Turner
Cluster Enablement Team (CET) Senior Engineer
phone: 919-543-2505 / mobile: 919-381-8739
t...@us.ibm.com
ibm.com/systems/services/labservices

(Embedded image moved to file: pic30772.jpg)




From:   Russell Jones
To: xCAT Users Mailing list
Date:   07/25/2011 11:27 AM
Subject:[xcat-user] nextdestiny behavior, slow response



Hi all,

When adding a cluster to our network, we are getting slow responses from
what appears to be getdestiny/nextdestiny on our xCAT 2.3 cluster, and I
was wondering if any enhancements / changes have been made in 2.6 to
this behavior?

The steps of replicating the problem (and how we add a cluster):

* Rack and cable up, power on
* Run getmacs
* Add cluster nodes to other required tables (we have a custom script
that does this)
* makedhcp for the cluster
* nodeset cluster runcmd=bmcsetup
* Power cycle nodes

At this point, as the nodes begin asking the management node for
getdestiny/nextdestiny, the management node begins to become very slow
in responding to the getdestiny/nextdestiny requests. We are only
talking about 30 to 35 nodes at a time, and it happens when the nodes
boot nbfs and get their commands to run bmcsetup. We also see a perl
process consistently using 100% of a CPU core during this time. What is
the process of how these nodes are pulling this data? Obviously it's
getting it from the chain table, but what could cause this response to
be so resource hungry and slow?

Also, we are looking to upgrade from 2.3 to 2.6. Has there been any
changes to this process that could assist in correcting this behavior?


Thanks!



--

Storage Efficiency Calculator
This modeling tool is based on patent-pending intellectual property that
has been used successfully in hundreds of IBM storage optimization engage-
ments, worldwide.  Store less, Store more with what you own, Move data to
the right place. Try It Now!
http://www.accelacomm.com/jaw/sfnl/114/51427378/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



--
Storage Efficiency Calculator
This modeling tool is based on patent-pending intellectual property that
has been used successfully in hundreds of IBM storage optimization engage-
ments, worldwide.  Store less, Store more with what you own, Move data to
the right place. Try It Now! http://www.accelacomm.com/jaw/sfnl/114/51427378/


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
--
Storage Efficiency Calculator
This modeling tool is based on patent-pending intellectual property that
has been used successfully in hundreds of IBM storage optimization engage-
ments, worldwide.  Store less, Store more with what you own, Move data to 
the right place. Try It Now! http://www.accelacomm.com/jaw/sfnl/114/51427378/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] nextdestiny behavior, slow response

2011-07-25 Thread Russell Jones
Of course. Thanks for the help!



On 7/25/2011 11:00 AM, Lissa Valletta wrote:
> I asked our developer, will let you know.  It is always better to be on a
> supported release though.
>
> Lissa K. Valletta
> 2-3/T12
> Poughkeepsie, NY 12601
> (tie 293) 433-3102
>
>
>
>
>
> From: Russell Jones
> To:   xCAT Users Mailing list
> Date: 07/25/2011 11:46 AM
> Subject:  Re: [xcat-user] nextdestiny behavior, slow response
>
>
>
> Thanks Lissa!
>
> Would you happen to know if any of these changes actually specifically
> changed the getdestiny/nextdestiny behavior?
>
>
>
> On 7/25/2011 10:34 AM, Lissa Valletta wrote:
>> Since xCAT 2.3 has not been supported for a while.   I would suggest you
>> upgrade at least to 2.5,  a supported release.  There have been
> tremendous
>> changes in xCAT from 2.3.   If you do the upgrade should be automatic,
> your
>> database will be migrated, etc.One thing you might need to do since
>> your release is so old,  is after the upgrade,  manually stop the xcatd
> and
>> make sure all processes are cleaned up and then restart it.  Also, if you
>> have service node,  make sure they are upgraded also.
>>
>> Lissa K. Valletta
>> 2-3/T12
>> Poughkeepsie, NY 12601
>> (tie 293) 433-3102
>>
>>
>>
>>
>>
>> From: Russell Jones
>> To:   xCAT Users Mailing list
>> Date: 07/25/2011 11:27 AM
>> Subject:  [xcat-user] nextdestiny behavior, slow response
>>
>>
>>
>> Hi all,
>>
>> When adding a cluster to our network, we are getting slow responses from
>> what appears to be getdestiny/nextdestiny on our xCAT 2.3 cluster, and I
>> was wondering if any enhancements / changes have been made in 2.6 to
>> this behavior?
>>
>> The steps of replicating the problem (and how we add a cluster):
>>
>> * Rack and cable up, power on
>> * Run getmacs
>> * Add cluster nodes to other required tables (we have a custom script
>> that does this)
>> * makedhcp for the cluster
>> * nodeset cluster runcmd=bmcsetup
>> * Power cycle nodes
>>
>> At this point, as the nodes begin asking the management node for
>> getdestiny/nextdestiny, the management node begins to become very slow
>> in responding to the getdestiny/nextdestiny requests. We are only
>> talking about 30 to 35 nodes at a time, and it happens when the nodes
>> boot nbfs and get their commands to run bmcsetup. We also see a perl
>> process consistently using 100% of a CPU core during this time. What is
>> the process of how these nodes are pulling this data? Obviously it's
>> getting it from the chain table, but what could cause this response to
>> be so resource hungry and slow?
>>
>> Also, we are looking to upgrade from 2.3 to 2.6. Has there been any
>> changes to this process that could assist in correcting this behavior?
>>
>>
>> Thanks!
>>
>>
>>
>>
> --
>
>> Storage Efficiency Calculator
>> This modeling tool is based on patent-pending intellectual property that
>> has been used successfully in hundreds of IBM storage optimization
> engage-
>> ments, worldwide.  Store less, Store more with what you own, Move data to
>> the right place. Try It Now!
>> http://www.accelacomm.com/jaw/sfnl/114/51427378/
>> ___
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
>>
>>
>>
>>
> --
>
>> Storage Efficiency Calculator
>> This modeling tool is based on patent-pending intellectual property that
>> has been used successfully in hundreds of IBM storage optimization
> engage-
>> ments, worldwide.  Store less, Store more with what you own, Move data to
>> the right place. Try It Now!
> http://www.accelacomm.com/jaw/sfnl/114/51427378/
>> ___
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
> --
>
> Storage Efficiency Calculator
> This modeling tool is based on patent-pending intellectual property that
> has been used successfully in hundreds of IBM storage optimization engage-
> ments, worldwide.  Store less, Store more with what yo

Re: [xcat-user] nextdestiny behavior, slow response

2011-07-26 Thread Russell Jones

  
  
Thanks!

We will set our chain behavior then to shell after everything is
done so that they do not keep hitting the management node every 5
seconds for instructions. 

Unfortunately we did not capture the full execution line of the perl
process that was using 100% of a core, so I do not have that
information to provide. If this occurs again, I will be sure to
attempt to get the full process line and investigate a bit further.


Thanks again for the help!



On 7/26/2011 8:45 AM, Xiao Peng Wang wrote:

  In the latest code, xCAT does not support to set the query
interval for the get next destiny (task).

If you set the runcmd=bmcsetup, the node will try to run the
getdestiny to communicate with xcatd to get the next destiny
every 5 seconds.
If you set the standby as the next destiny, it will sleep 15-30
seconds and try to get again.

So the question, what do you want the node to do after running
of bmcsetup? If nothing to do, keep it to 'shell' is a good
idea.

BTW, I am wondering that which perl process used 100% CPU
resource? 

Thanks
Best Regards
--
Wang Xiaopeng (王晓朋)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email: w...@cn.ibm.com
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West
Road, Haidian District Beijing P.R.China 100193
    
    Russell Jones ---2011-07-26 00:06:27---Of
  course. Thanks for the help! On 7/25/2011 11:00 AM, Lissa
  Valletta wrote:

From: Russell
  Jones 
To: xCAT
  Users Mailing list 
Date: 2011-07-26
  00:06
Subject: Re:
  [xcat-user] nextdestiny behavior, slow response
  
  
  
  
  Of course. Thanks for the help!



On 7/25/2011 11:00 AM, Lissa Valletta wrote:
> I asked our developer, will let you know.  It is always
better to be on a
> supported release though.
>
> Lissa K. Valletta
> 2-3/T12
> Poughkeepsie, NY 12601
> (tie 293) 433-3102
>
>
>
    >
>
> From: Russell Jones
> To: xCAT Users Mailing
list
> Date: 07/25/2011 11:46 AM
> Subject: Re: [xcat-user] nextdestiny behavior, slow
response
>
>
>
> Thanks Lissa!
>
> Would you happen to know if any of these changes actually
specifically
> changed the getdestiny/nextdestiny behavior?
>
>
>
> On 7/25/2011 10:34 AM, Lissa Valletta wrote:
>> Since xCAT 2.3 has not been supported for a while.   I
would suggest you
>> upgrade at least to 2.5,  a supported release.  There
have been
> tremendous
>> changes in xCAT from 2.3.   If you do the upgrade
should be automatic,
> your
>> database will be migrated, etc.    One thing you might
need to do since
>> your release is so old,  is after the upgrade,
 manually stop the xcatd
> and
>> make sure all processes are cleaned up and then restart
it.  Also, if you
>> have service node,  make sure they are upgraded also.
>>
>> Lissa K. Valletta
    >> 2-3/T12
>> Poughkeepsie, NY 12601
>> (tie 293) 433-3102
>>
>>
>>
>>
>>
>> From:  Russell Jones
>> To:  xCAT Users Mailing
list
>> Date:  07/25/2011 11:27 AM
>> Subject:  [xcat-user] nextdestiny behavior, slow
response
>>
>>
>>
>> Hi all,
>>
>> When adding a cluster to our network, we are getting
slow responses from
>> what appears to be getdestiny/nextdestiny on our xCAT
2.3 cluster, and I
>> was wondering if any enhancements / changes have been
made in 2.6 to
>> this behavior?
>>
>> The steps of replicating the problem (and how we add a
cluster):
>>
>> * Rack and cable up, power on
>> * Run getmacs
>> * Add cluster nodes to other required tables (we have a
custom script
>> that does this)
>> * makedhcp for the cluster
>> * nodeset cluster runcmd=bmc

Re: [xcat-user] Switching off dynamic DNS, and choice of domain.

2011-07-29 Thread Russell Jones
Hi Andrew,

Configuring site.nameservers and network.nameservers to point to your 
external name servers should be sufficient. Then, just handling DHCP 
like normal, and never running "makedns" *should* essentially stop your 
management node from attempting to handle DNS. (Someone can correct me 
if I am mistaken about the new ddns functionality). You may also want to 
ensure /etc/resolv.conf on the MN and the CN's are also pointing to the 
external name servers properly after setting the above configuration.

- If we are to use the management node as a name server, is it best that each 
cluster has its
own domain?



Just curious on your thought process. What is the end goal you are 
looking to achieve in separating each cluster into their own domain?




On 7/29/2011 10:33 PM, Andrew Spiers wrote:
> Hi, we've got a small cluster, where we are using an external name
> server. We've set up static DNS entries in this name server, and
> everything seems to be working properly, except xcat is trying to
> dynamically update our name server.
> What is the best way to turn this off? Is it as simple as changing
>  "dnshandler","ddns",,
> to
>  "dnshandler","",,
> ?
>
> My other question is more of a design query. If we are to use the
> management node as a name server, is it best that each cluster has its
> own domain?
>
> Regards,
>
> Andrew Spiers
> Victorian Partnership for Advanced Computing
>
>
> --
> Got Input?   Slashdot Needs You.
> Take our quick survey online.  Come on, we don't ask for help often.
> Plus, you'll get a chance to win $100 to spend on ThinkGeek.
> http://p.sf.net/sfu/slashdot-survey
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>

--
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] node rinstall in loop

2011-08-14 Thread Russell Jones


  
  
When the node requests its nextdestiny (from chain table) or
provides an update to say it has finished installing/running
postscripts, it does so through its xcat master. The node's xcat
master then queries/updates the database for this information. 

Can you provide an lsdef of this node from both the first and second
server?

What is the virtual IP the nodes are DHCP'ing/installing from? If
the two servers are indeed identical and you have the nodes
installing from the Virtual IP that toggles between the two
server's, and the xcat master's IP and DNS entries are this Virtual
IP, there should be no difference in the installation behavior if
the node is configured properly.




On 8/14/2011 9:33 AM, Gilad Berman wrote:
Hello,
  
  
  We have xCAT server configured in
HA
(using RH cluster with VIP)
We have backup-restore xCAT tables to the second node, than run
make,
replicated /install folder.
  
  All site tables and xcat MASTER
records
are configure to use the VIP.
  
  
  Now we try to rinstall one of the
nodes
from the second server - and the node in a loop of installation.
  
  When we move the VIP to the first
server
- installation finish successfully.
  
  
  Everything else seems to be
working
properly. 
  
  How does xCAT updates the chain
table?
maybe the nodes sends a message to the first xCAT server? 
  
  
  Network services (dhcpd, named,
atftpd,
xcatd etc.) running on both servers. 
  
   
  
  Please advise ...
  
  
  thx in advance. 
  
  
Regards,

Gilad Berman
HPC Architect
IBM System & Technology Group. Israel

E-mail: gil...@il.ibm.com
Tel:    972-3-9188262
Mobile: 972-52-2554262
  

--
FREE DOWNLOAD - uberSVN with Social Coding for Subversion.
Subversion made easy with a complete admin console. Easy 
to use, easy to manage, easy to install, easy to extend. 
Get a Free download of the new open ALM Subversion platform now.
http://p.sf.net/sfu/wandisco-dev2dev
  

___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


  


--
FREE DOWNLOAD - uberSVN with Social Coding for Subversion.
Subversion made easy with a complete admin console. Easy 
to use, easy to manage, easy to install, easy to extend. 
Get a Free download of the new open ALM Subversion platform now.
http://p.sf.net/sfu/wandisco-dev2dev___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] xCAT with external DHCP and DNS

2011-08-31 Thread Russell Jones
I've ran xCAT with external DNS without an issue. External DHCP however 
makes things more interesting.


I imagine that as long as the external DHCP server that is answering the 
requests has the "next-server" and "filename" set properly to point the 
host to the management node/service nodes it should work properly. Just 
chkconfig dhcpd off, and even chmod the dhcp binary to 000  (or rename 
it) to keep it from ever starting on the management node.


Also set site.dhcpsetup to "0" so that dhcp leases are not written out 
when nodesets are issued.



On 8/31/2011 6:03 AM, Assaf Leibovitch wrote:

Hi Guys,

Any hint?



Assaf Leibovitch
XIV Host Side QA Automation
+972-3-6897745



From: Gilad Berman/Israel/IBM
To: xCAT Users Mailing list 
Cc: Assaf Leibovitch/Israel/IBM@IBMIL, Shay Berman/Israel/IBM@IBMIL
Date: 28/08/2011 21:29
Subject: Re: [xcat-user] xCAT with external DHCP and DNS



We would like both. manually configure the entry on the site DNS 
(dynamic) worked OK and the installation succeeded, however, we had to 
start DHCP on the xCAT server. this is not something we want, of course.


thx for your reply.

Regards,

Gilad Berman
HPC Architect
IBM System & Technology Group. Israel

E-mail: gil...@il.ibm.com
Tel:972-3-9188262
Mobile: 972-52-2554262




From: Jarrod Johnson 
To: xCAT Users Mailing list 
Date: 28/08/2011 21:11
Subject: Re: [xcat-user] xCAT with external DHCP and DNS




Did you want xCAT to push DNS records?  Did you want it to push DHCP 
records?


On Sun, Aug 28, 2011 at 8:22 AM, Gilad Berman <_gil...@il.ibm.com_ 
> wrote:

Hello list,

We're trying to install xCAT as an installation server in existing 
'classic' IT environment with existing (dynamic) DNS and DHCP servers. 
 i followed some of the documentation but always something not working 
properly.
Any good documentation or Howto? anyone has  successfully implemented 
this?


Regards,

Gilad Berman
HPC Architect
IBM System & Technology Group. Israel

E-mail: _gil...@il.ibm.com_ 
Tel:972-3-9188262
Mobile: 972-52-2554262

--
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management
Up to 160% more powerful than alternatives and 25% more efficient.
Guaranteed. _http://p.sf.net/sfu/emc-vnx-dev2dev_
___
xCAT-user mailing list_
__xCAT-user@lists.sourceforge.net_ 
_

__https://lists.sourceforge.net/lists/listinfo/xcat-user_

--
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management
Up to 160% more powerful than alternatives and 25% more efficient.
Guaranteed. 
http://p.sf.net/sfu/emc-vnx-dev2dev___

xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



--
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
--
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] Purpose for copying all of /postscripts to /xcatpost on the nodes

2011-09-06 Thread Russell Jones
Just curious, what is the reasoning behind copying the entire 
/install/postscripts tree to /xcatpost on every node on deployment 
(xcatdsklspost) as opposed to just the postscripts it is actually going 
to run, or just mounting the /install/postscripts tree and running via NFS?

--
Malware Security Report: Protecting Your Business, Customers, and the 
Bottom Line. Protect your business and customers by understanding the 
threat from malware and how it can impact your online business. 
http://www.accelacomm.com/jaw/sfnl/114/51427462/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] nbfs replacement name

2011-09-06 Thread Russell Jones

  
  
I like xPLODE. Captures well what happens if you mess up a node and
need to use nbfs to fix it   :-)
  



On 9/6/2011 5:57 PM, Jarrod Johnson wrote:
Anyone have a preference?  So far I have received
  xEROS (xCAT Embedded Remote Operating System)
  Genesis (Genesis Enhanced Netboot Environment for System
Information and Servicing)
  xCITE (xCAT Configuration, Information, and Troubleshooting
Environment)
  xPLORE (xCAT's Pretty Lightweight Offline Repair Environment)
  xPLODE (xCAT's Pretty Lightweight Offline Diagnostic
Environment)
  
  
  I'm personally leaning toward Genesis.
  
  
  
  --
Malware Security Report: Protecting Your Business, Customers, and the 
Bottom Line. Protect your business and customers by understanding the 
threat from malware and how it can impact your online business. 
http://www.accelacomm.com/jaw/sfnl/114/51427462/
  
  
  
  ___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


  


--
Malware Security Report: Protecting Your Business, Customers, and the 
Bottom Line. Protect your business and customers by understanding the 
threat from malware and how it can impact your online business. 
http://www.accelacomm.com/jaw/sfnl/114/51427462/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] Booting diskless nodes directly over NFS

2011-09-29 Thread Russell Jones
Hi all,

Does xCAT natively support directing a diskless node to download its 
root image via NFS? Example of what I am referring to:

cat /tftpboot/pxelinux.cfg/node13
#netboot centos5.4-x86_64-comp
DEFAULT xCAT
LABEL xCAT
  KERNEL xcat/netboot/centos5.4/x86_64/comp/kernel
  APPEND initrd=xcat/netboot/centos5.4/x86_64/comp/initrd.gz 
imgurl=http://172.29.252.121/install/netboot/centos5.4/x86_64/comp/rootimg.sfs


I would like the "imgurl" line to be an NFS path, such as 
NFS://nfs-server:path/to/rootimg.sfs

Not entirely sure how one would go about setting this properly if it's 
supported, and I have not seen it mentioned in the docs anywhere.


Thanks!

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] updateflag.awk hangs forever?

2011-10-16 Thread Russell Jones
Sounds like perhaps a postscript may be hanging. Do you have any custom 
postscripts? Any interesting messages in the other terminal windows 
during install?

Are you truly doing diskfull node installation or are these diskless 
nodes? There's a known issue with CentOS 5.5 and the xcatdsklspost 
script that will cause a stateless node to hang during boot.



On 10/16/2011 4:46 AM, Dario Dorella wrote:
> Hello list,
>
> I am trying to install a CentOS5 cluster using xCAT, and looking at
> what happens during installation it seems that the updateflag.awk never
> receives its "done" message from xcatd and keeps looping.
>
> Has anybody an idea on what I might be doing wrong and on how to debug this?
>
>
> Thx,
> Dario
>
> --
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] Cannot get service node to install in SLES

2012-01-22 Thread Russell Jones
Hi all,

Attempting to debug a service node install on a SLES 11.1 system. The 
node gets through the standard install, but the xcat packages do not get 
installed. Running "updatenode service01 -S" shows:

xcat-suse:~ # updatenode service01 -S
Performing software maintenance operations. This could take a while.

service01: Running postscript: ospkgs
service01: Repository 'SUSE-Linux-Enterprise-Server-11-SP1' not found by 
alias, number or URI.
service01: Repository '11.1.1-1.152' not found by alias, number or URI.
service01: Running postscript: otherpkgs
service01: mv: cannot stat `xcat-suse/post/otherpkgs/sles11.1/x86_64/*': 
No such file or directory
service01: NFSSERVER=xcat-suse
service01: OTHERPKGDIR=xcat-suse/post/otherpkgs/sles11.1/x86_64
service01: Repository 'SUSE-Linux-Enterprise-Server-11-SP1 11.1.1-1.152' 
is up to date.
service01: Repository 'sles11.1' is up to date.
service01: All repositories have been refreshed.
service01: zypper --non-interactive update --auto-agree-with-license
service01: Loading repository data...
service01: Reading installed packages...
service01:
service01: Nothing to do.
service01:  zypper remove -y  perl-doc
service01: Loading repository data...
service01: Reading installed packages...
service01: 'perl-doc' is not installed.
service01: Resolving package dependencies...
service01:
service01: Nothing to do.
service01:  rpm -Uvh --replacepkgs  xcat/xcat-core/xCATsn* 
xcat/xcat-dep/sles11/x86_64/conserver*
service01: error: File not found by glob: xcat/xcat-core/xCATsn*
service01: error: File not found by glob: 
xcat/xcat-dep/sles11/x86_64/conserver*
service01: Running of Software Maintenance has completed.


I'm not sure what the issue here is, and SLES/zypper is new to me. First 
it says it can't find the repos, then it says they are up to date.

The repos "look" correct to me, and the apache logs on the management 
node (also SLES) show that the node is at least attempting to look at 
the repo. Any ideas?

service01:~ # zypper repos -u
# | Alias| 
Name | Enabled | Refresh | URI
--+--+--+-+-+---
1 | SUSE-Linux-Enterprise-Server-11-SP1 11.1.1-1.152 | 
SUSE-Linux-Enterprise-Server-11-SP1 11.1.1-1.152 | Yes | Yes | 
http://xcat-suse/install/sles11.1/x86_64/1
2 | sles11.1 | 
sles11.1 | Yes | No  | 
ftp://xcat-suse/sles11.1/x86_64/1



DNS and all is working properly.

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Cannot get service node to install in SLES

2012-01-22 Thread Russell Jones
Nevermind, got it worked out. "Otherpkgs" wasn't set up properly on the 
management node to handle the xcat dependencies.

:)

On 1/22/2012 9:56 PM, Russell Jones wrote:
> Hi all,
>
> Attempting to debug a service node install on a SLES 11.1 system. The
> node gets through the standard install, but the xcat packages do not get
> installed. Running "updatenode service01 -S" shows:
>
> xcat-suse:~ # updatenode service01 -S
> Performing software maintenance operations. This could take a while.
>
> service01: Running postscript: ospkgs
> service01: Repository 'SUSE-Linux-Enterprise-Server-11-SP1' not found by
> alias, number or URI.
> service01: Repository '11.1.1-1.152' not found by alias, number or URI.
> service01: Running postscript: otherpkgs
> service01: mv: cannot stat `xcat-suse/post/otherpkgs/sles11.1/x86_64/*':
> No such file or directory
> service01: NFSSERVER=xcat-suse
> service01: OTHERPKGDIR=xcat-suse/post/otherpkgs/sles11.1/x86_64
> service01: Repository 'SUSE-Linux-Enterprise-Server-11-SP1 11.1.1-1.152'
> is up to date.
> service01: Repository 'sles11.1' is up to date.
> service01: All repositories have been refreshed.
> service01: zypper --non-interactive update --auto-agree-with-license
> service01: Loading repository data...
> service01: Reading installed packages...
> service01:
> service01: Nothing to do.
> service01:  zypper remove -y  perl-doc
> service01: Loading repository data...
> service01: Reading installed packages...
> service01: 'perl-doc' is not installed.
> service01: Resolving package dependencies...
> service01:
> service01: Nothing to do.
> service01:  rpm -Uvh --replacepkgs  xcat/xcat-core/xCATsn*
> xcat/xcat-dep/sles11/x86_64/conserver*
> service01: error: File not found by glob: xcat/xcat-core/xCATsn*
> service01: error: File not found by glob:
> xcat/xcat-dep/sles11/x86_64/conserver*
> service01: Running of Software Maintenance has completed.
>
>
> I'm not sure what the issue here is, and SLES/zypper is new to me. First
> it says it can't find the repos, then it says they are up to date.
>
> The repos "look" correct to me, and the apache logs on the management
> node (also SLES) show that the node is at least attempting to look at
> the repo. Any ideas?
>
> service01:~ # zypper repos -u
> # | Alias|
> Name | Enabled | Refresh | URI
> --+--+--+-+-+---
> 1 | SUSE-Linux-Enterprise-Server-11-SP1 11.1.1-1.152 |
> SUSE-Linux-Enterprise-Server-11-SP1 11.1.1-1.152 | Yes | Yes |
> http://xcat-suse/install/sles11.1/x86_64/1
> 2 | sles11.1 |
> sles11.1 | Yes | No  |
> ftp://xcat-suse/sles11.1/x86_64/1
>
>
>
> DNS and all is working properly.
>
> --
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-dev2
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] Diskfull nodes and resolv.conf

2012-01-27 Thread Russell Jones
Hi all,

I am comparing the autoinst files generated between xcat 2.3 and 2.6, 
and it appears that at one point (xcat 2.3) the /etc/resolv.conf was 
filled in via the "/opt/xcat/sbin/makenamed.conf" file. Looking at the 
autoinst files in 2.6, while this file exists the autoinst files being 
generated do not contain any entries for placing data in 
/etc/resolv.conf. As a result, if I use the "hardeths" script the node 
has no DNS information configured.

I did however see that there is now a postscript named "mkresolvconf" 
that seems to take care of this. Is this the preferred method now of 
configuring a static IP and DNS information on a node (hardeths + 
mkresolvconf)?

Thanks!

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Diskfull nodes and resolv.conf

2012-01-29 Thread Russell Jones
ot;8",,
  "ppcmaxp","64",,
  "ppcretry","3",,
  "ppctimeout","0",,
  "powerinterval","0",,
  "syspowerinterval","0",,
  "sharedtftp","0",,
  "SNsyncfiledir","/var/xcat/syncfiles",,
  "tftpdir","/tftpboot",,
  "xcatdport","3001",,
  "xcatiport","3002",,
  "xcatconfdir","/etc/xcat",,
  "timezone","America/New_York",,
  "useNmapfromMN","no",,
  "enableASMI","no",,
  "db2installloc","/mntdb2",,
  "databaseloc","/var/lib",,
  "sshbetweennodes","ALLGROUPS",,
  "dnshandler","ddns",,
  "vsftp","y",,
  "cleanupxcatpost","no",,
  "domain","site.local",,
  "installloc","xcat-suse:/install",,
  
  xcat-suse:~ # ifconfig
  eth0  Link encap:Ethernet  HWaddr 00:0C:29:59:87:B0
    inet addr:172.16.99.234  Bcast:172.16.99.255 
  Mask:255.255.255.0
    UP BROADCAST RUNNING MULTICAST  MTU:1500 
  Metric:1
    RX packets:26412 errors:0 dropped:0
  overruns:0 frame:0
    TX packets:21189 errors:0 dropped:0
  overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:2510431 (2.3 Mb)  TX bytes:3681162
  (3.5 Mb)
  
  eth1  Link encap:Ethernet  HWaddr 00:0C:29:59:87:BA
    inet addr:192.168.1.1  Bcast:192.168.1.255 
  Mask:255.255.255.0
    UP BROADCAST RUNNING MULTICAST  MTU:1500 
  Metric:1
    RX packets:552754 errors:0 dropped:0
  overruns:0 frame:0
    TX packets:2447039 errors:0 dropped:0
  overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:38007415 (36.2 Mb)  TX
  bytes:3647660521 (3478.6 Mb)
  
  









On 1/29/2012 5:39 AM, Lissa Valletta wrote:

  The mkresolvconf script was written to support the Service Node move
function we now support in hiearchical clusters.  It was not written for
basic setup during install.   There was a big change for Name resolutions
from 2.3 to 2.6.   We no longer use bind but have replaced it with ddns.
Could you check you setup based on this documentation:

http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Name_Resolution

IF you note in the doc "The resolv.conf files for the compute nodes will be
created automatically using the "domain" and "nameservers" values set in
the site definition. " Need to check your site table attributes.

Lissa K. Valletta
2-3/T12
Poughkeepsie, NY 12601
(tie 293) 433-3102





From:	Russell Jones 
To:	xCAT Users Mailing list 
Date:	01/27/2012 05:42 PM
Subject:	[xcat-user] Diskfull nodes and resolv.conf



Hi all,

I am comparing the autoinst files generated between xcat 2.3 and 2.6,
and it appears that at one point (xcat 2.3) the /etc/resolv.conf was
filled in via the "/opt/xcat/sbin/makenamed.conf" file. Looking at the
autoinst files in 2.6, while this file exists the autoinst files being
generated do not contain any entries for placing data in
/etc/resolv.conf. As a result, if I use the "hardeths" script the node
has no DNS information configured.

I did however see that there is now a postscript named "mkresolvconf"
that seems to take care of this. Is this the preferred method now of
configuring a static IP and DNS information on a node (hardeths +
mkresolvconf)?

Thanks!

--

Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user




--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



  


--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Diskfull nodes and resolv.conf

2012-01-29 Thread Russell Jones

  
  
Some additional information, this looks like it may be limited to
just SLES in this case. I deployed a CentOS 5.6 node with the
hardeths postscript and /etc/resolv.conf was populated properly. No
other changes were made to the node outside of changing the OS to
deploy.

I noticed that "/opt/xcat/share/xcat/install/scripts" has a few
scripts that set resolv.conf. For example:

xcat-suse:/opt/xcat/share/xcat/install/scripts # grep -i resolv
*
post.debian:echo "search #TABLE:site:key=domain:value#"
>/etc/resolv.conf
post.debian:done >>/etc/resolv.conf
post.esx:echo "search #TABLE:site:key=domain:value#"
>/etc/resolv.conf
post.esx:done >>/etc/resolv.conf
post.esx:cat /etc/resolv.conf
post.rh.iscsi:echo "search #TABLE:site:key=domain:value#"
>/etc/resolv.conf
post.rh.iscsi:done >>/etc/resolv.conf
post.ubuntu:echo "search #TABLE:site:key=domain:value#"
>/etc/resolv.conf
post.ubuntu:done >>/etc/resolv.conf
  


Given that neither centos nor sles seem to have a default postscript
that set this (aside from the rh.isci one?), I am unsure what
mechanism xCAT is relying on to define the nameserver configuration
for diskfull nodes. If you could shine some light on where this can
be found, it may help me better track down why CentOS nodes are
working but SLES nodes are not.

Thanks again!



On 1/29/2012 10:24 AM, Russell Jones wrote:

  
  Thanks Lisa,
  
  I have checked my tables again against option #1 in that page you
  provided and everything seems correct.  See below for my site
  table attributes. Here's a bit more info on what's going on:
  
  Both diskless and diskfull nodes are booting and acquiring their
  /etc/resolv.conf without an issue as long as they are using DHCP.
  "makehosts", "makedns", and "makedhcp" all work and populate the
  various files correctly with no errors. It's when I add the
  "hardeths" postscript in that diskfull nodes only lose
  their DNS configuration. IP and subnet gets set correctly for the
  nodes, it's just /etc/resolv.conf that is empty. 
  
  The following is the /etc/resolv.conf of a diskless node after it
  was booted with the hardeths postscript. As you can see it
  appears it still keeps the DNS config from when DHCP ran on it
  prior to the interface being turned into a static one:
  
  c1n1:~ # cat /etc/resolv.conf
# Generated by dhcpcd for interface eth0
search site.local
nameserver 192.168.1.1
  
  
  
  Here is his xCAT configuration after I have told it to install:
  
  xcat-suse:/var/log # nodeset c1n1 install
c1n1: install sles11.1-x86_64-compute
c1n1: install sles11.1-x86_64-compute

xcat-suse:/var/log # lsdef c1n1
Object name: c1n1
    arch=x86_64
    currchain=boot
    currstate=install sles11.1-x86_64-compute
    groups=all
    initrd=xcat/sles11.1/x86_64/initrd
    ip=192.168.1.200
    kcmdline=autoyast=http://service01/install/autoinst/c1n1
install=http://service01/install/sles11.1/x86_64/1
    kernel=xcat/sles11.1/x86_64/linux
    mac=00:50:56:11:44:33
    netboot=pxe
    nfsserver=service01
    os=sles11.1
    postbootscripts=otherpkgs
    postscripts=syslog,remoteshell,syncfiles,hardeths
    profile="">
    provmethod=install
    servicenode=service01
    status=booted
    statustime=01-29-2012 15:45:24
  
  
  
  
  And here is his /etc/resolv.conf after installing:
  
  c1n1:~ # cat /etc/resolv.conf
### /etc/resolv.conf file autogenerated by netconfig!
#
# Before you change this file manually, consider to
define the
# static DNS configuration using the following
variables in the
# /etc/sysconfig/network/config file:
# NETCONFIG_DNS_STATIC_SEARCHLIST
# NETCONFIG_DNS_STATIC_SERVERS
# NETCONFIG_DNS_FORWARDER
# or disable DNS configuration updates via netconfig
by setting:
# NETCONFIG_DNS_POLICY=''
#
# See also the netconfig(8) manual page and other
documentation.
#
# Note: Manual change of this file disables netconfig
too, but
# may get lost when this file contains comments or
empty lines
# only, the netconfig settings are same with settings
in this
# file and in case

[xcat-user] Grab groups a node is a member of

2012-01-30 Thread Russell Jones
Hello,

Is it possible to query the xcat server directly via a postscript to 
grab all groups a node is a member of? I am not seeing an easy way of 
doing so outside of either just a direct SQL query or SSH command from 
the compute node to the master node. Was hoping to be able to do it 
through xcat's server interface.

I also dumped all of the environment variables available on install and 
didn't see a group list among them.

Thanks!

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Grab groups a node is a member of

2012-01-30 Thread Russell Jones
Thanks Jarrod,

Can you please clarify what you mean by "in remote"? Currently just to 
be able to continue with the script I am creating I am doing a SQL query 
directly on the database (as the thought of having many compute nodes 
SSH'ing to the management node to run nodels seems more dirty).

Is there a supported way of grabbing the groups a node is a member of 
from the xcatmaster via the same way that the compute nodes grab the 
postscripts they need to run?




On 1/30/2012 4:11 PM, Jarrod B Johnson wrote:
> Equivalent of 'nodels  groups' in remote can work, it may 
> warrant something more special...

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Diskfull nodes and resolv.conf

2012-01-30 Thread Russell Jones

  
  
Hi Jing,

The diskless nodes work fine. The only nodes I am having trouble
with is diskful nodes. I am using SLES 11.1.

I know that the hardeths postscript does not have any logic in it to
change /etc/resolv.conf. Is it possible for you to point me to what
script / logic in xCAT populate /etc/resolv.conf for diskful nodes?
I'm unable to find where in xCAT this happens at.


Thanks!



On 1/30/2012 9:03 PM, Jing CDL Sun wrote:
Hi Russell,
  
  
  I just checked the postscript
hardeths,
it seems it only changes the eth interface config from dhcp to
static,
while not changes /etc/resolv.conf, so I agree with Christopher,
assuming
your problem is just dhcp, not related to hardeths postscript(if
possible,
you could consider to remove hardeths postscript for your
diskful nodes,
and try the install again).
  
  
  I do not have a diskful SLES
handy,
but only diskless SLES11 SP1 one, I think for dhcp, it might be
the same
logic. but not sure what SLES version are you using? In my
SLES11 SP1 diskless
cluster, the /etc/resolv.conf can be populated correctly, here
is the details,
FYI..
  
  
  dx360m3n05:~ # cat
/etc/resolv.conf
  
  # Generated by dhcpcd for
interface
eth6
  
  search ppd.pok.ibm.com
  
  nameserver 9.114.34.227
  
  dx360m3n05:~ # cat /etc/*release
  
  SUSE Linux Enterprise Server 11
(x86_64)
  
  VERSION = 11
  
  PATCHLEVEL = 1
  
  dx360m3n05:~ #  
  
  
To debug your problem, it's better to check:
  
  
  1. on the mn, /etc/dhcpd.conf,
make
sure it contains the similar lines below in the shared-network
section
for the eth you are using:
  
  
      option domain-name
"ppd.pok.ibm.com";
  
      option domain-name-servers
 9.114.34.227;
  
  
2. after the cn is installed and booted, the dhcpcd will get the
domain-name
and domain-name-servers from dhcpsd, and populated
/etc/resolv.conf.
  
  
  If the dhpcd.conf on your mn is
correct,
then maybe this problem is related to the SLES version you are
using?
  
  
  
  

Best Regards,
-
Sun Jing(孙靖)
IBM China Software Development Laboratory
Tel: (86-10) 82453625   E-mail: sj...@cn.ibm.com
Address: Building 28, ZhongGuanCun Software Park,
         No.8, Dong Bei Wang West Road, Haidian
District Beijing 100193, PRC

北京市海淀区东北旺西路8号中关村软件园28号楼
邮编: 100193
  
  
  
  
  

      
Russell
      Jones 
  
  2012-01-30 03:00

  

  
Please respond to
xCAT Users Mailing list

  

  


  


  

  

  To

xcat-user@lists.sourceforge.net

  
  

  cc


  

  
  

  Subject

Re: [xcat-user]
Diskfull nodes and resolv.conf
  

  
  
  

  

  



  

  
  

  

  
  
  
  
  Some additional information, this looks like it may
be
limited to just SLES in this case. I deployed a CentOS 5.6 node
with the
hardeths postscript and /etc/resolv.conf was populated properly.
No other
changes were made to the node outside of changing the OS to
deploy.

I noticed that "/opt/xcat/share/xcat/install/scripts" has a few
scripts that set resolv.conf. For example:
  
  xcat-suse:/opt/xcat/share/xcat/install/scripts # grep -i
  resolv *
  post.debian:echo "search #TABLE:site:key=domain:value#"
  >/etc/re

Re: [xcat-user] Diskfull nodes and resolv.conf

2012-01-30 Thread Russell Jones

  
  
Thanks Jing,

/etc/resolv.conf is populated properly when not using the hardeths
postscript (IE, relying on DHCP). It's only when the hardeths
postscript is used on diskful nodes that /etc/resolv.conf does not
get any data written to it.



On 1/30/2012 9:59 PM, Jing CDL Sun wrote:
Russell,
  
  
  As I know, xCAT does not use any
special
script to populate /etc/resolv.conf for diskful nodes, we only
depends
on the dhcp config as I described below. once the dhcp server
has been
configured with the correct name resolution options, when the
compute nodes
booted up from dhcp, then the dhcp client will populate the
/etc/resolv.conf
automatically.
  
  
  I will try to install a SLES11
SP1 diskful
node today, to see if there is any difference between diskful
and diskless
nodes. also, if possible, could you try to remove the hardeths
postscript
from your diskful node def, then re-install it to see if it's
the same?
Thx.
  
  




Best Regards,
-
Sun Jing(孙靖)
IBM China Software Development Laboratory
Tel: (86-10) 82453625   E-mail: sj...@cn.ibm.com
Address: Building 28, ZhongGuanCun Software Park,
         No.8, Dong Bei Wang West Road, Haidian
District Beijing 100193, PRC

北京市海淀区东北旺西路8号中关村软件园28号楼
邮编: 100193
  
  
  
  
  

  
        Russell
  Jones 
  
  2012-01-31 11:33

  

  
Please respond to
xCAT Users Mailing list

  

  


  


  

  

  To

xcat-user@lists.sourceforge.net

  
  

  cc


  

  
  

  Subject

Re: [xcat-user]
Diskfull nodes and resolv.conf
  

  
  
  

  

  



  

  
  

  

  
  
  
  
  Hi Jing,

The diskless nodes work fine. The only nodes I am having trouble
with is
diskful nodes. I am using SLES 11.1.

I know that the hardeths postscript does not have any logic in
it to change
/etc/resolv.conf. Is it possible for you to point me to what
script / logic
in xCAT populate /etc/resolv.conf for diskful nodes? I'm unable
to find
where in xCAT this happens at.


Thanks!



On 1/30/2012 9:03 PM, Jing CDL Sun wrote: 
  
  Hi Russell,

  
I just checked the postscript hardeths, it seems it only changes
the eth
interface config from dhcp to static, while not changes
/etc/resolv.conf,
so I agree with Christopher, assuming your problem is just dhcp,
not related
to hardeths postscript(if possible, you could consider to remove
hardeths
postscript for your diskful nodes, and try the install again).

  
I do not have a diskful SLES handy, but only diskless SLES11 SP1
one, I
think for dhcp, it might be the same logic. but not sure what
SLES version
are you using? In my SLES11 SP1 diskless cluster, the
/etc/resolv.conf
can be populated correctly, here is the details, FYI..

  
dx360m3n05:~ # cat /etc/resolv.conf 
# Generated by dhcpcd for interface eth6 
search ppd.pok.ibm.com 
nameserver 9.114.34.227 
dx360m3n05:~ # cat /etc/*release 
SUSE Linux Enterprise Server 11 (x86_64) 
VERSION = 11 
PATCHLEVEL = 1 
dx360m3n05:~ #   

To debug your problem, it's better to check: 
  
1. on the mn, /etc/dhcpd.conf, make sure it contains the similar
lines
below in the shared-network section for t

Re: [xcat-user] Grab groups a node is a member of

2012-01-31 Thread Russell Jones
Hi Lissa,

Thanks! We are trying to get the groups a node is a member of for a 
special postscript that will execute various commands depending on what 
group that node belongs to. The nodes in our environment all belong to 
multiple groups depending on their roles.

I am attempting to avoid the SSH route as we are dealing with a large 
cluster. SSH'ing back to that node's xcatmaster doesn't really seem like 
the cleanest way of doing it.

I've created a special mysql user that only has read perms on the 
nodelist table, and that seems to work well enough for now. I'll have a 
look at the remoteshell postscript as well and see if I can find a way 
of getting it to make the xcatmaster return the groups a node belongs 
to. That would be my preferred method instead of direct SQL queries.


Thanks!



On 1/31/2012 6:55 AM, Lissa Valletta wrote:
> You are trying to get all the groups that a node is a member of on thetr
> compute node?   You are right it is not in the environment variables.  I
> guess I wonder why you need to know the group list on the nodes?
> We also do not normally setup ssh from the compute nodes to the Management
> node,  but if you have that setup you, could run
> ssh MN "nodels  nodelist.groups".   If you are on a service node
> you can just run  the nodels command on the SN.
>
> You can write a postscript that will actually contact the xcat daemon on
> the MN and run a script,  but this is a little more difficult to do.  An
> example of this is /install/postscripts/remoteshell.
>
>
>
> Lissa K. Valletta
> 2-3/T12
> Poughkeepsie, NY 12601
> (tie 293) 433-3102
>
>
>
>
>
> From: Russell Jones
> To:   xCAT Users Mailing list
> Date: 01/30/2012 04:25 PM
> Subject:  [xcat-user] Grab groups a node is a member of
>
>
>
> Hello,
>
> Is it possible to query the xcat server directly via a postscript to
> grab all groups a node is a member of? I am not seeing an easy way of
> doing so outside of either just a direct SQL query or SSH command from
> the compute node to the master node. Was hoping to be able to do it
> through xcat's server interface.
>
> I also dumped all of the environment variables available on install and
> didn't see a group list among them.
>
> Thanks!
>
> --
>
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-dev2
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
>
> --
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>

--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Grab groups a node is a member of

2012-01-31 Thread Russell Jones
Also meant to ask, I see that the awk scripts are feeding the daemon 
various commands, such as "getpostscript" and "getcredentials", is there 
documentation somewhere of what all commands are available, or are all 
of the commands available handled by the plugin files in the xCAT_plugin 
directory?



On 1/31/2012 8:59 AM, Russell Jones wrote:
> Hi Lissa,
>
> Thanks! We are trying to get the groups a node is a member of for a
> special postscript that will execute various commands depending on what
> group that node belongs to. The nodes in our environment all belong to
> multiple groups depending on their roles.
>
> I am attempting to avoid the SSH route as we are dealing with a large
> cluster. SSH'ing back to that node's xcatmaster doesn't really seem like
> the cleanest way of doing it.
>
> I've created a special mysql user that only has read perms on the
> nodelist table, and that seems to work well enough for now. I'll have a
> look at the remoteshell postscript as well and see if I can find a way
> of getting it to make the xcatmaster return the groups a node belongs
> to. That would be my preferred method instead of direct SQL queries.
>
>
> Thanks!
>
>
>
> On 1/31/2012 6:55 AM, Lissa Valletta wrote:
>> You are trying to get all the groups that a node is a member of on thetr
>> compute node?   You are right it is not in the environment variables.  I
>> guess I wonder why you need to know the group list on the nodes?
>> We also do not normally setup ssh from the compute nodes to the Management
>> node,  but if you have that setup you, could run
>> ssh MN "nodels   nodelist.groups".   If you are on a service node
>> you can just run  the nodels command on the SN.
>>
>> You can write a postscript that will actually contact the xcat daemon on
>> the MN and run a script,  but this is a little more difficult to do.  An
>> example of this is /install/postscripts/remoteshell.
>>
>>
>>
>> Lissa K. Valletta
>> 2-3/T12
>> Poughkeepsie, NY 12601
>> (tie 293) 433-3102
>>
>>
>>
>>
>>
>> From:Russell Jones
>> To:  xCAT Users Mailing list
>> Date:01/30/2012 04:25 PM
>> Subject: [xcat-user] Grab groups a node is a member of
>>
>>
>>
>> Hello,
>>
>> Is it possible to query the xcat server directly via a postscript to
>> grab all groups a node is a member of? I am not seeing an easy way of
>> doing so outside of either just a direct SQL query or SSH command from
>> the compute node to the master node. Was hoping to be able to do it
>> through xcat's server interface.
>>
>> I also dumped all of the environment variables available on install and
>> didn't see a group list among them.
>>
>> Thanks!
>>
>> --
>>
>> Try before you buy = See our experts in action!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-dev2
>> ___
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
>>
>>
>>
>> --
>> Keep Your Developer Skills Current with LearnDevNow!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-d2d
>> ___
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
> --
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>

--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Grab groups a node is a member of

2012-02-01 Thread Russell Jones
 

Hi Lissa, 

The database is MySQL. 

We are creating a postscript
that is going to be handling configuring OS settings on the node based
on various definitions, such as the profile and groups a node is a
member of. Profile is passed via the environment variables so that is
not an issue to get, it's just grabbing the groups that is. 

While we
do configure SSH from the compute nodes back to the management node, I
really do not want to go that route, as several hundred compute nodes
causing SSH processes to spin up on the master node just to do a nodels
doesn't seem like the right route to go with this :) 

If I can have the
compute nodes grab this information from their xcat master through the
xcatd daemon itself, that would be perfect and preferred over doing
direct MySQL queries. I agree that getting this information from xCAT
directly is better. 

On 01.02.2012 09:33, Lissa Valletta wrote: 

> So
what database are you running? I am not sure why you would do an SQL
>
query vs use xCAT database commands. You could pretty easy write a
script
> to get a list of all nodes using nodels and their group
accessing the
> nodelist.groups attribute. The other thing is we do not
say we will
> never change our database definitions. If you use the xCAT
commands, you
> will be immune from any changes. Not that I expect to
change this area.
> 
> And no there is no supported way, to grab the
groups the nodes are in from
> the compute node, other than write your
own postscript. I am still
> curious why you need the groups on the
compute nodes.
> 
> Lissa K. Valletta
> 2-3/T12
> Poughkeepsie, NY
12601
> (tie 293) 433-3102
> 
> From: Russell Jones

> To: xcat-user@lists.sourceforge.netDate:
01/30/2012 05:26 PM
> Subject: Re: [xcat-user] Grab groups a node is a
member of
> 
> Thanks Jarrod,
> 
> Can you please clarify what you mean
by "in remote"? Currently just to
> be able to continue with the script
I am creating I am doing a SQL query
> directly on the database (as the
thought of having many compute nodes
> SSH'ing to the management node to
run nodels seems more dirty).
> 
> Is there a supported way of grabbing
the groups a node is a member of
> from the xcatmaster via the same way
that the compute nodes grab the
> postscripts they need to run?
> 
> On
1/30/2012 4:11 PM, Jarrod B Johnson wrote:
> 
>> Equivalent of
'nodelsgroups' in remote can work, it may warrant something more
special...
> 
>
--
>

> Try before you buy = See our experts in action!
> The most
comprehensive online learning library for Microsoft developers
> is just
$99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro
Style Apps, more. Free future releases when you subscribe now!
>
http://p.sf.net/sfu/learndevnow-dev2
>
___
> xCAT-user mailing
list
> xCAT-user@lists.sourceforge.net
>
https://lists.sourceforge.net/lists/listinfo/xcat-user
> 
>
--
>
Keep Your Developer Skills Current with LearnDevNow!
> The most
comprehensive online learning library for Microsoft developers
> is just
$99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro
Style Apps, more. Free future releases when you subscribe now!
>
http://p.sf.net/sfu/learndevnow-d2d
>
___
> xCAT-user mailing
list
> xCAT-user@lists.sourceforge.net
>
https://lists.sourceforge.net/lists/listinfo/xcat-user

 --
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Grab groups a node is a member of

2012-02-01 Thread Russell Jones

   So you know you would have to make all your nodes mysql clients and give
   them access to the mysql server.

Yes. Not an issue, while they do not all belong to the same flat 
network, the network is routed.


   You would also have to have the mysql admin id and password available on the 
nodes for the query.   Not sure I would want to do this.


I agree, that would be a terrible thing to do, so I did not go this 
route :) I created a separate, non-privileged mysql user that only has 
permissions to read from the "nodelist" table. This user cannot read any 
other table, nor can they write to the nodelist table. Only read.




On 2/1/2012 2:28 PM, Lissa Valletta wrote:

So you know you would have to make all your nodes mysql clients and give
them access to the mysql server.You would also have to have the mysql
admin id and password available on the nodes for the query.   Not sure I
would want to do this.

Lissa K. Valletta
2-3/T12
Poughkeepsie, NY 12601
(tie 293) 433-3102





From:   Russell Jones
To: 
Date:   02/01/2012 12:51 PM
Subject:Re: [xcat-user] Grab groups a node is a member of



Hi Lissa,


The database is MySQL.


We are creating a postscript that is going to be handling configuring OS
settings on the node based on various definitions, such as the profile and
groups a node is a member of. Profile is passed via the environment
variables so that is not an issue to get, it's just grabbing the groups
that is.


While we do configure SSH from the compute nodes back to the management
node, I really do not want to go that route, as several hundred compute
nodes causing SSH processes to spin up on the master node just to do a
nodels doesn't seem like the right route to go with this :)


If I can have the compute nodes grab this information from their xcat
master through the xcatd daemon itself, that would be perfect and preferred
over doing direct MySQL queries. I agree that getting this information from
xCAT directly is better.











On 01.02.2012 09:33, Lissa Valletta wrote:


  So what database are you running?  I am not sure why you would do an SQL
  query vs use xCAT database commands.   You could pretty easy write a
  script
  to get a list of all nodes using nodels   and their group accessing the
  nodelist.groups attribute.The other thing is  we do not  say we will
  never change our database definitions.   If you use the xCAT commands, you
  will be immune from any changes.  Not that I expect to change this area.

  And no there is no supported way,  to grab the groups the nodes are in
  from
  the compute node,  other than write your own postscript.  I am still
  curious why you need the groups on the compute nodes.



  Lissa K. Valletta
  2-3/T12
  Poughkeepsie, NY 12601
  (tie 293) 433-3102





  From:      Russell Jones
  To:xcat-user@lists.sourceforge.netDate:01/30/2012
  05:26 PM
  Subject:   Re: [xcat-user] Grab groups a node is a member of



  Thanks Jarrod,

  Can you please clarify what you mean by "in remote"? Currently just to
  be able to continue with the script I am creating I am doing a SQL query
  directly on the database (as the thought of having many compute nodes
  SSH'ing to the management node to run nodels seems more dirty).

  Is there a supported way of grabbing the groups a node is a member of
  from the xcatmaster via the same way that the compute nodes grab the
  postscripts they need to run?




  On 1/30/2012 4:11 PM, Jarrod B Johnson wrote:
  Equivalent of 'nodelsgroups' in remote can work, it may warrant something
  more special...
  --


  Try before you buy = See our experts in action!
  The most comprehensive online learning library for Microsoft developers
  is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
  Metro Style Apps, more. Free future releases when you subscribe now!
  http://p.sf.net/sfu/learndevnow-dev2
  ___
  xCAT-user mailing list
  xCAT-user@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/xcat-user




  --

  Keep Your Developer Skills Current with LearnDevNow!
  The most comprehensive online learning library for Microsoft developers
  is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
  Metro Style Apps, more. Free future releases when you subscribe now!
  http://p.sf.net/sfu/learndevnow-d2d
  ___
  xCAT-user mailing list
  xCAT-user@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/xcat-user



--

Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just 

Re: [xcat-user] ddns update problems

2012-02-07 Thread Russell Jones
What all changed on the system between yesterday and today? :)

On 2/7/2012 12:47 PM, Jonathan Dye wrote:
> also, i should say, makedns works with hosts that have been added to the 
> hosts file beforehand.
> dhcp also hands out leases and pxe boots machines and vms with expected 
> images.  the ones that rely on ddns do not get dns records, however, and fail 
> in postscripts.
>
> dnshandler is set to ddns in the site table and, again, this was all working 
> yesterday.
>
> - jonathan
>
> - Original Message -
> From: "Jonathan Dye"
> To: "xCAT Users Mailing list"
> Sent: Tuesday, February 7, 2012 11:41:23 AM
> Subject: [xcat-user] ddns update problems
>
> suddenly DDNS updates are not happening.  the update key is set to the same 
> thing in both dhcp and named.  i've tried cleaning the dhcp and dns state up 
> to the best of my knowledge.  manual updates with that key work in the 
> nsupdate utility.  i tried watching what happened when i update a lease in 
> omshell but i was getting "not implemented" errors in omshell while trying to 
> load existing leases.  anyways- could someone help me with how to diagnose 
> this?
>
> - jonathan
>
> --
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
> --
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>

--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] ddns update problems

2012-02-07 Thread Russell Jones
Haha!

I know... I've been down that road before. xcat's DDNS plugin is very, 
very picky. Usually starting fresh and having xcat re-generate the hosts 
file, dhcp and dns configuration seems to fix things.

DNS also seemed to work better before the support for dynamic DNS was added.



On 2/7/2012 1:29 PM, Jonathan Dye wrote:
> i know, right?
>
> nothing explicitly with the dhcp or dns configurations.  despite not having 
> all the free addresses in the range accounted for, it just stopped working at 
> like the 52nd VM in a batch of 80.  restarting that vm would never result in 
> ddns registering the address.  neither would cleaning, or changing the omapi 
> key on both sides.
>
> i'm sorry to report that fatigue set in and i started using unsavory methods 
> to fix it.  once i stopped changing one thing at a time it started working 
> after some combination of:
>
> patching xcat to break v6 detection so it is always off.
> removing the dhcpd file (and not relying on makedhcp -n to replace it)
> cleaning the environment again (by cleaning the zone files, the lease files, 
> and with makedns -n, makedhcp -n)
> creating the managed-keys.bind file which was causing an error on startup
>
> this is not a production environment and it's good enough for now.
>
> - jonathan
>
> - Original Message -
> From: "Russell Jones"
> To: xcat-user@lists.sourceforge.net
> Sent: Tuesday, February 7, 2012 11:58:57 AM
> Subject: Re: [xcat-user] ddns update problems
>
> What all changed on the system between yesterday and today? :)
>
> On 2/7/2012 12:47 PM, Jonathan Dye wrote:
>> also, i should say, makedns works with hosts that have been added to the 
>> hosts file beforehand.
>> dhcp also hands out leases and pxe boots machines and vms with expected 
>> images.  the ones that rely on ddns do not get dns records, however, and 
>> fail in postscripts.
>>
>> dnshandler is set to ddns in the site table and, again, this was all working 
>> yesterday.
>>
>> - jonathan
>>
>> - Original Message -
>> From: "Jonathan Dye"
>> To: "xCAT Users Mailing list"
>> Sent: Tuesday, February 7, 2012 11:41:23 AM
>> Subject: [xcat-user] ddns update problems
>>
>> suddenly DDNS updates are not happening.  the update key is set to the same 
>> thing in both dhcp and named.  i've tried cleaning the dhcp and dns state up 
>> to the best of my knowledge.  manual updates with that key work in the 
>> nsupdate utility.  i tried watching what happened when i update a lease in 
>> omshell but i was getting "not implemented" errors in omshell while trying 
>> to load existing leases.  anyways- could someone help me with how to 
>> diagnose this?
>>
>> - jonathan
>>
>> --
>> Keep Your Developer Skills Current with LearnDevNow!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-d2d
>> ___
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
>> --
>> Keep Your Developer Skills Current with LearnDevNow!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-d2d
>> ___
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
> --
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
> -

[xcat-user] xCAT bmcsetup troubles

2012-03-14 Thread Russell Jones

Hi all,

Putting together a rack of some Dell c6100 series nodes, and am running 
into some strange bmcsetup errors.


Everything appears to be configured properly on the xcat management 
node, both the compute and the BMC networks all ping without an issue. 
When I tell a node to do a bmcsetup, it displays the following error on 
the screen:


Unable to prove root on your IP approves of this request



It then waits for a random period of time, tries again and always 
succeeds the second time with no errors being shown.


The management node does not have access to the internet, and as a 
result it is also dumping these bind warnings out to /var/log/messages. 
These only appear when the compute node is requesting config parameters 
from the MN:


Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts resolving '0/A' 
(in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts resolving 
'./NS' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts resolving 
'C.ROOT-SERVERS.NET/' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts resolving 
'C.ROOT-SERVERS.NET/' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts resolving 
'C.ROOT-SERVERS.NET/' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts resolving 
'L.ROOT-SERVERS.NET/' (in '.'?): disabling EDNS




Could these issues be a side effect of bind not having access to the 
internet to do root DNS lookups? It seems unlikely, but given that it 
seems to always succeed a second time, and after succeeding everything 
works perfectly, I'm at a loss.



Thanks for any help!


--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] xCAT bmcsetup troubles

2012-03-14 Thread Russell Jones

  
  
Thanks for the info Jarrod!

On 3/14/2012 10:10 AM, Jarrod B Johnson wrote:

  It's a race condition, we have tuned in 2.7 to 'win' more
often, we lose the race a lot on the first pass. The DNS issues
are unrelated. We haven't considered it urgent since the retry
serves to rectify it. The first pass failing is innocuous.

For info, the issue is the awk daemon we write seems to
occasionally take a surprisingly long time to start and bind the
socket. We probably could have also fixed it by writing a tiny C
daemon, but kept to scripting code since it's only a few seconds
paid for bmcsetup.
    
Russell Jones ---03/14/2012 10:49:47 AM---Hi
  all, Putting together a rack of some Dell c6100 series nodes,
  and am running
    
    From: Russell
  Jones 
To: xCAT
  Users Mailing list , 
Date: 03/14/2012
  10:49 AM
Subject: [xcat-user]
  xCAT bmcsetup troubles
  
  
  
  
  Hi all,

Putting together a rack of some Dell c6100 series nodes, and am
running into some strange bmcsetup errors. 

Everything appears to be configured properly on the xcat
management node, both the compute and the BMC networks all ping
without an issue. When I tell a node to do a bmcsetup, it
displays the following error on the screen:

Unable to prove root on your IP approves of this request



It then waits for a random period of time, tries again and
always succeeds the second time with no errors being shown. 

The management node does not have access to the internet, and as
a result it is also dumping these bind warnings out to
/var/log/messages. These only appear when the compute node is
requesting config parameters from the MN:

Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts
resolving '0/A' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts
resolving './NS' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts
resolving 'C.ROOT-SERVERS.NET/' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts
resolving 'C.ROOT-SERVERS.NET/' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts
resolving 'C.ROOT-SERVERS.NET/' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts
resolving 'L.ROOT-SERVERS.NET/' (in '.'?): disabling EDNS


  
Could these issues be a side effect of bind not having access to
the internet to do root DNS lookups? It seems unlikely, but
given that it seems to always succeed a second time, and after
succeeding everything works perfectly, I'm at a loss.


Thanks for any help!

--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud
computing 
also focuses on allowing computing to be delivered as a service.
  http://www.accelacomm.com/jaw/sfnl/114/51521223/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/xcat-user
  
  
  
  
  --
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
  
  
  
  ___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


  

--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] xCAT with Dell 6100 servers

2012-03-14 Thread Russell Jones

What are you needing to setup? :)

On 3/14/2012 3:08 PM, Evelio Quiros wrote:

Hello,

Is there some information on setting up Dell 6100 nodes in xCAT ?

Thanks,
Al Quiros
Florida International University



--
Virtualization&  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] Add-on products for SLES use tftpserver entry instead of nfsserver (ala redhat)

2012-03-15 Thread Russell Jones
It appears that add-on entries defined in the SLES templates is using 
tftpserver address instead of nfsserver to fill in the URL. If I am not 
mistaken this makes it impossible to utilize service node pools as a result.


Example of an add-on block with tftpserver blank (notice how the URL 
line is missing the host):





http:///install/sles11.1/x86_64/sdk1
SuSE-Linux-SDK
/
false 
SuSE-Linux-SDK 






Example with tftpserver filled in:




http://172.16.0.1/install/sles11.1/x86_64/sdk1
SuSE-Linux-SDK
/
false 
SuSE-Linux-SDK 







Can this behavior be changed to utilize the nfsserver line instead like 
redhat/centos uses for repo URL's?



--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Add-on products for SLES use tftpserver entry instead of nfsserver (ala redhat)

2012-03-16 Thread Russell Jones

  
  
Thanks!

I would push for another attribute named "httpserver" being
introduced. Would make much more sense imho :)



On 3/16/2012 3:31 AM, Hua Zhong Wang wrote:

  Two questiosn are addressed here, 

1. In xcat 2.6.x, there are several problems on sles/rhel
working with service node pool. From xcat 2.7,
tftpserver/xcatmaster can be set to empty and image server
setting to tftpserver address should work here.
2. We are still discussing if image server should be set to
tftpserver address or adding some other new attribute like
"httpserver". "nfsserver" is just used for nfs mount, if
ram-disk statelite is the case, it doesn't make sense to set
image server to nfsserver address.



Best Regards,

--
Wang Huazhong(王华忠)

IBM CSTL HPC System Management Development
Tel: 86-10-82452279
Email: wangh...@cn.ibm.com
Address: Ring Building 28,ZhongGuanCun Software Park,No.8 Dong
Bei Wang West Road, Haidian District Beijing P.R.China 100193


Russell Jones ---2012-03-16 01:52:36---It
  appears that add-on entries defined in the SLES templates is
  using tftpserver address instead of
    
    From: Russell
  Jones 
To: xcat-user@lists.sourceforge.net
Date: 2012-03-16
  01:52
Subject: [xcat-user]
  Add-on products for SLES use tftpserver entry instead of
  nfsserver (ala redhat)
  
  
  
  
  It appears that add-on entries defined in the SLES
templates is using tftpserver address instead of nfsserver to
fill in the URL. If I am not mistaken this makes it impossible
to utilize service node pools as a result.

Example of an add-on block with tftpserver blank (notice how the
URL line is missing the host):
  
   
      
        
          http:///install/sles11.1/x86_64/sdk1
          SuSE-Linux-SDK
          /
          false 
          SuSE-Linux-SDK 
        
      
   


Example with tftpserver filled in:
  
   
      
        
          http://172.16.0.1/install/sles11.1/x86_64/sdk1
          SuSE-Linux-SDK
          /
          false 
          SuSE-Linux-SDK 
        
      
   



Can this behavior be changed to utilize the nfsserver line
instead like redhat/centos uses for repo URL's?

  --
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
  http://p.sf.net/sfu/sfd2d-msazure___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/xcat-user
  
  
  
  
  --
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
  
  
  
  ___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


  

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] xCAT with Dell 6100 servers

2012-03-16 Thread Russell Jones
Yes, you have to have an IP that the xCAT server can talk to in order 
for it to be able to issue remote power requests. You also have to 
configure a username and password in the tables, as well as a DNS 
address for the BMC ip's. The xcat command "bmcsetup" can do all of the 
compute-side of the configuration for you.


Check the wiki for further information on how to do that. This page can 
help you understand the various xcat tables more: 
http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Intro_to_xCAT_Tables


This page has some information regarding the various xcat commands: 
http://xcat.sourceforge.net/man8/nodeset.8.html





On 3/16/2012 1:23 PM, Evelio Quiros wrote:


Well, we just got these C6100s for testing in a possible HPC 
configuration.
I am trying to get xCAT to talk to their IPMI configuration, but I 
suspect that additional configuration is required.

Currently, when I issue the following command:

rpower d6100a reset

I get this:

d6100a: Error: timeout

So, I guess I must be missing something that tells xCAT where 
the machine is, and how to reach it
Perhaps BIOS configuration at the C6100 node ? The setup screen tells 
me that IPMI is enabled.

Does the IPMI need further configuration, like an IP address ?
The ultimate goal here is to use VCL to control bare-metal installs 
via xCAT.


Thanks,
Al Quiros
Florida International University


--

Date: Wed, 14 Mar 2012 15:37:52 -0500
From: Russell Jones mailto:rjo...@eggycrew.com>>
Subject: Re: [xcat-user] xCAT with Dell 6100 servers
To: xcat-user@lists.sourceforge.net 
<mailto:xcat-user@lists.sourceforge.net>
Message-ID: <4f6101a0.5030...@eggycrew.com 
<mailto:4f6101a0.5030...@eggycrew.com>>

Content-Type: text/plain; charset="iso-8859-1"

What are you needing to setup? :)

On 3/14/2012 3:08 PM, Evelio Quiros wrote:

Hello,

Is there some information on setting up Dell 6100 nodes in xCAT ?

Thanks,
Al Quiros
Florida International University




--
Virtualization&  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
<mailto:xCAT-user@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/xcat-user

-- next part --
--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] xCAT with Dell 6100 servers

2012-03-20 Thread Russell Jones

  
  
Hi!

I've replied back to the list with your question so that this can
help any folks in the future Googling for this issue. Also someone
else on the list may have better documentation to be able to point
you to.

This Youtube video should be able to help you with defining your
IPMI devices: http://www.youtube.com/watch?v=-rUJ3dy2hjQ

This user has a few good xCAT videos:
http://www.youtube.com/user/sumavisor



If your servers are already pulling an IP address for their BMC
interfaces from DHCP, more than likely you either: 

A) Have the BMC interfaces plugged into the same flat switch as your
cluster network. In which case I would recommend vlanning them on
the switch so they are not sharing the same network and properly
defining the BMC network in the "networks" table. 

or

B) When you did your "getmacs" or other methods for adding all of
the MAC addresses into the xcat database you also added the BMC
interfaces and have the BMC network configured with your xcat
management node as the DHCP server.



In order to do my racks of c6100's, all I had to do was follow the
configuration steps that are outlined in that Youtube video. The BMC
interfaces were plugged into a completely separate switch that the
xCAT management node was also plugged into, and also had an IP on. I
defined the network in the "networks" table *without* DHCP support
(as I did not want my BMC interfaces obtaining dynamic IP's) and
manually configured them all using "bmcsetup".


Hope this helps!
Russell


On 3/19/2012 10:27 AM, Evelio Quiros wrote:

  
  Hello Russell,
  
  
  Thanks for the information. I've been looking at the
documentation, but still at a loss about how these basic
functions work.
  
  
  Ok, so I looked at the setup screen for one of the nodes
(d6100a). There is a place for IPMI configuration, but it
doesn't let me enter an IP. Also, when I issue an rpower
command, the server is already up and has it's IP address from
the dhcp server (in the same management node).
  
  
  What did you do to get your C6100s working with xCAT?
  
  
  Thanks,
  Al Quiros
  Florida International University
      
  
  
  
  

  From: Russell Jones
  <rjo...@eggycrew.com>
  Date: Fri, 16 Mar 2012
  13:31:01 -0500
  To: Al Quiros <evq...@fiu.edu>
  Cc: <xcat-user@lists.sourceforge.net>
  Subject: Re:
  [xcat-user] xCAT with Dell 6100 servers




  Yes, you have to have an
IP that the xCAT server can talk to in order for it to be
able to issue remote power requests. You also have to
configure a username and password in the tables, as well as
a DNS address for the BMC ip's. The xcat command "bmcsetup"
can do all of the compute-side of the configuration for you.

Check the wiki for further information on how to do that.
This page can help you understand the various xcat tables
more:
http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Intro_to_xCAT_Tables

This page has some information regarding the various xcat
commands: 
  http://xcat.sourceforge.net/man8/nodeset.8.html




On 3/16/2012 1:23 PM, Evelio Quiros wrote:

  


Well,
  we just got these C6100s for testing in a possible HPC
  configuration.
I
  am trying to get xCAT to talk to their IPMI
  configuration, but I suspect that additional
  configuration is required.
Currently,
  when I issue the following command:


rpower
  d6100a reset


I
  get this:


d6100a:
  Error: timeout



  So, I guess I must be missing something that
tells xCAT where the machine is, and how to reach it

Perhaps
  BIOS configuration at the C6100 node ? The setup
  screen tells me that IPMI is enabled.
Does
  the IPMI need further configuration, like an IP
  

[xcat-user] Nagios monitoring?

2012-03-20 Thread Russell Jones
Hi all,

I am looking at this page regarding Nagios monitoring of an xCAT 
cluster: 
http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Monitoring_an_xCAT_Cluster#Nagios_monitoring


It states that a plugin should be available for this (Nagios 
(nagiosmon.pm) (released)), however I am not seeing it any of the 2.6 
packages. Is this 2.7 only?


Thanks!

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] xCAT bmcsetup troubles

2012-03-30 Thread Russell Jones

Hi Ryan,

I've replied back to the list, if you don't mind please keep questions 
on the list so others can benefit :-)


You can use ipmitool to set the BMC configuration, however this is what 
bmcsetup uses as well (I believe). I know on the C6100's it works 
properly both on shared and dedicated BMC interface. If it is not 
working properly then you have a configuration error somewhere in xcat.




On 3/30/2012 12:13 PM, Ryan Enge wrote:

Hi Russell,

We use the standard BMC's on the nodes and for our setup we want to 
use a shared connection for the BMC. Do you know of a way to configure 
the BMC from the nodes OS? We are running RH 5 on the nodes and I have 
been trying to set this via IPMI and OMSA and no luck yet. I want to 
avoid manually updating the BIOS settings on hundreds of nodes.


On 12-03-14 9:00 AM, Russell Jones wrote:

Hey Ryan,

Nope, they can handle it right out of the box. The Dell BMC's act 
just like any other standard BMC that is programmable through 
ipmitool. If you purchased the upgraded DRAC's for the nodes, just 
make sure in the BIOS you set to use the dedicated interface instead 
of shared.


Other than that everything was very straight forward, no changes 
needed to be made to the code.





On 3/14/2012 10:53 AM, Ryan Enge wrote:

Hi Russell,

I thought I would reply off list for now, we have just received some 
Dell C6100 series nodes and I was under the impression that xCat 
could not program the BMC for these nodes. Did you have do do any 
manual configuration in the BMC or make any changes to the xCat code 
to get this working?


On 03/14/2012 07:43 AM, Russell Jones wrote:

Hi all,

Putting together a rack of some Dell c6100 series nodes, and am 
running into some strange bmcsetup errors.


Everything appears to be configured properly on the xcat management 
node, both the compute and the BMC networks all ping without an 
issue. When I tell a node to do a bmcsetup, it displays the 
following error on the screen:


Unable to prove root on your IP approves of this request



It then waits for a random period of time, tries again and always 
succeeds the second time with no errors being shown.


The management node does not have access to the internet, and as a 
result it is also dumping these bind warnings out to 
/var/log/messages. These only appear when the compute node is 
requesting config parameters from the MN:


Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts resolving 
'0/A' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts resolving 
'./NS' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts resolving 
'C.ROOT-SERVERS.NET/' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts resolving 
'C.ROOT-SERVERS.NET/' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts resolving 
'C.ROOT-SERVERS.NET/' (in '.'?): disabling EDNS
Mar 13 10:46:24 linux7mgt named[6915]: too many timeouts resolving 
'L.ROOT-SERVERS.NET/' (in '.'?): disabling EDNS




Could these issues be a side effect of bind not having access to 
the internet to do root DNS lookups? It seems unlikely, but given 
that it seems to always succeed a second time, and after succeeding 
everything works perfectly, I'm at a loss.



Thanks for any help!





--
Ryan Enge
Senior Systems Administrator
University Systems
University of Victoria
Room: CLE D063
Phone: 250.472.5447
Cell: 250.516.4975


--
Virtualization&  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



--
Ryan Enge
Senior Systems Administrator
University Systems
University of Victoria
Room: CLE D063
Phone: 250.472.5447
Cell: 250.516.4975
--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Basic xCAT operation for a Newbie

2012-04-02 Thread Russell Jones

Hi Al,

ipmi
#node,bmc,bmcport,username,password,comments,disable


This can't be empty. If you review the video I sent you a while back it 
will help you define the tables properly for ipmi remote power support: 
http://www.youtube.com/watch?v=-rUJ3dy2hjQ



On 4/2/2012 2:29 PM, Evelio Quiros wrote:

Hello,

I am under the impression that the idea of xcat is to allow deployment 
and provisioning of systems without having to touch each one or change 
the bios settings.
After spending some time reviewing the documentation, I am still 
unable to get xcat to contact my nodes.When trying rpower commands, I 
get a timeout error.  I am sure that there must be some fundamental 
piece missing in my setup, so let's review what I have:


I have 2 networks, a private network and a (semi)public network. On 
all the nodes (including management node), eth0 is on the private 
network, and eth1 is on the public.
The public network has its own dhcp and dns server, so that part is 
taken care of externally.


I have a dell C6100, a 4-node machine that is essentially 4 separate 
computers in one box.
It is setup with all of the out-of-the-box defaults. These will be my 
compute nodes (n1-n4).
These are the node I am trying to reach via xcat. These nodes and the 
management node below are all on the same networks.


I have single management node running redhat linux 6.
I am running version 2.7 of the xcat code, installed from the 
repository via yum. I also have the xcat-deps installed.
I added the nodes (only 4 of them) to the /etc/hosts manually. I also 
added them to the hosts table in xcat.
I ran makenetworks, and edited the network table to disable the public 
network, just keep the private net.
I ran makedns and makedhcp. I can reboot the nodes manually, and they 
come up with their correct names and IP addresses.


I am able to view and edit the tables. This is a dump of the current 
table structures I have.
Perhaps I am missing something in my tables ? Perhaps there is another 
essential table I am missing ?


Thanks for your help,
Al Quiros
Florida international University



site
#key,value,comments,disable
"domain","p.fiu.edu",,
"blademaxp","64",,
"fsptimeout","0",,
"installdir","/install",,
"ipmimaxp","64",,
"ipmiretries","3",,
"ipmitimeout","2",,
"consoleondemand","no",,
"master","10.0.0.3",,
"forwarders","131.94.205.10,131.94.7.220,131.94.69.36",,
"nameservers","10.0.0.3",,
"maxssh","8",,
"ppcmaxp","64",,
"ppcretry","3",,
"ppctimeout","0",,
"powerinterval","0",,
"syspowerinterval","0",,
"sharedtftp","1",,
"SNsyncfiledir","/var/xcat/syncfiles",,
"tftpdir","/tftpboot",,
"xcatdport","3001",,
"xcatiport","3002",,
"xcatconfdir","/etc/xcat",,
"timezone","America/New_York",,
"useNmapfromMN","no",,
"enableASMI","no",,
"db2installloc","/mntdb2",,
"databaseloc","/var/lib",,
"sshbetweennodes","ALLGROUPS",,
"dnshandler","ddns",,
"vsftp","n",,
"cleanupxcatpost","no",,

networks
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,nodehostname,ddnsdomain,vlanid,domain,comments,disable
"private","10.0.0.0","255.255.254.0","eth0","10.0.0.3","10.0.03","10.0.0.3","131.94.205.10,131.94.7.220",,,"10.0.0.10-10.0.0.128",,
"public_disabled","10.106.128.0","255.255.254.0","eth1","10.106.128.1",,"10.106.128.12",,"1"

nodelist
#node,groups,status,statustime,appstatus,appstatustime,primarysn,hidden,comments,disable
"n1","ipmi,compute,all"
"n2","ipmi,compute,all"
"n3","ipmi,compute,all"
"n4","ipmi,compute,all"

nodehm
#node,power,mgt,cons,termserver,termport,conserver,serialport,serialspeed,serialflow,getmac,comments,disable
"compute",,"ipmi",,

ipmi
#node,bmc,bmcport,username,password,comments,disable

mp
#node,mpa,id,nodetype,comments,disable

mpa
#mpa,username,password,comments,disable

noderes
#node,servicenode,netboot,tftpserver,tftpdir,nfsserver,monserver,nfsdir,installnic,primarynic,discoverynics,cmdinterface,xcatmaster,current_osimage,next_osimage,nimserver,routenames,comments,disable
"compute",,"pxe","10.0.0.3",,"10.0.0.3",,,"eth0","eth0",
"ipmi",,"pxe","10.0.0.3",,"10.0.0.3",,,"eth0","eth0",

passwd
#key,username,password,cryptmethod,comments,disable
"ipmi","root","password",,,
"omapi","xcat_key","eVlKQVFiOHg5NkFTa3pRdkhSVEZHeE1nN1RVRWZZc1Q=",,,

chain
#node,currstate,currchain,chain,ondiscover,comments,disable
"n1","runcmd=bmcsetup",

switch
#node,switch,port,vlan,interface,comments,disable

nodetype
#node,os,arch,profile,provmethod,supportedarchs,nodetype,comments,disable
"n1","image","x86","rh5image-rh66-v0","image","x86,x86_64",,,
"n2","image","x86","rh5image-rh66-v0","image","x86,x86_64",,,
"n3","image","x86","rh5image-rh66-v0","image","x86,x86_64",,,
"n4","image","x86","rh5image-rh66-v0","image","x86,x86_64",,,

mac
#node,interface,mac,comments,disable
"n1","eth0","00:26:6C:FF:09:34",,
"n2","eth0","00:26:6C:FF:05:90",,
"n3","eth0"

Re: [xcat-user] Basic xCAT operation for a Newbie

2012-04-02 Thread Russell Jones

All of this is covered in the video :-)

On 4/2/2012 4:00 PM, Evelio Quiros wrote:

Hi Russel,

Actually, I found it after I sent the email. The ipmi table now has this:

ipmi
#node,bmc,bmcport,username,password,comments,disable
"n1","n1-bmc",
"n2","n2-bmc",
"n3","n3-bmc",
"n4","n4-bmc",

I manually populated it instead of using regular expressions since 
there were only 4 nodes.


I get this now:

rpower all stat|xcoll
n1: Error: Could not resolve n1-bmc to an address
n2: Error: Could not resolve n2-bmc to an address
n3: Error: Could not resolve n3-bmc to an address
n4: Error: Could not resolve n4-bmc to an address

Where do I specify the addresses for the bmc interface ? Isn't it just 
the same address as eth0 ?


Thanks,
Al Q


From: Russell Jones mailto:rjo...@eggycrew.com>>
Reply-To: xCAT Users Mailing list <mailto:xcat-user@lists.sourceforge.net>>

Date: Mon, 2 Apr 2012 15:27:20 -0500
To: <mailto:xcat-user@lists.sourceforge.net>>

Subject: Re: [xcat-user] Basic xCAT operation for a Newbie

Hi Al,

ipmi
#node,bmc,bmcport,username,password,comments,disable


This can't be empty. If you review the video I sent you a while back 
it will help you define the tables properly for ipmi remote power 
support: http://www.youtube.com/watch?v=-rUJ3dy2hjQ



On 4/2/2012 2:29 PM, Evelio Quiros wrote:

Hello,

I am under the impression that the idea of xcat is to allow 
deployment and provisioning of systems without having to touch each 
one or change the bios settings.
After spending some time reviewing the documentation, I am still 
unable to get xcat to contact my nodes.When trying rpower commands, I 
get a timeout error.  I am sure that there must be some fundamental 
piece missing in my setup, so let's review what I have:


I have 2 networks, a private network and a (semi)public network. On 
all the nodes (including management node), eth0 is on the private 
network, and eth1 is on the public.
The public network has its own dhcp and dns server, so that part is 
taken care of externally.


I have a dell C6100, a 4-node machine that is essentially 4 separate 
computers in one box.
It is setup with all of the out-of-the-box defaults. These will be my 
compute nodes (n1-n4).
These are the node I am trying to reach via xcat. These nodes and the 
management node below are all on the same networks.


I have single management node running redhat linux 6.
I am running version 2.7 of the xcat code, installed from the 
repository via yum. I also have the xcat-deps installed.
I added the nodes (only 4 of them) to the /etc/hosts manually. I also 
added them to the hosts table in xcat.
I ran makenetworks, and edited the network table to disable the 
public network, just keep the private net.
I ran makedns and makedhcp. I can reboot the nodes manually, and they 
come up with their correct names and IP addresses.


I am able to view and edit the tables. This is a dump of the current 
table structures I have.
Perhaps I am missing something in my tables ? Perhaps there is 
another essential table I am missing ?


Thanks for your help,
Al Quiros
Florida international University



site
#key,value,comments,disable
"domain","p.fiu.edu",,
"blademaxp","64",,
"fsptimeout","0",,
"installdir","/install",,
"ipmimaxp","64",,
"ipmiretries","3",,
"ipmitimeout","2",,
"consoleondemand","no",,
"master","10.0.0.3",,
"forwarders","131.94.205.10,131.94.7.220,131.94.69.36",,
"nameservers","10.0.0.3",,
"maxssh","8",,
"ppcmaxp","64",,
"ppcretry","3",,
"ppctimeout","0",,
"powerinterval","0",,
"syspowerinterval","0",,
"sharedtftp","1",,
"SNsyncfiledir","/var/xcat/syncfiles",,
"tftpdir","/tftpboot",,
"xcatdport","3001",,
"xcatiport","3002",,
"xcatconfdir","/etc/xcat",,
"timezone","America/New_York",,
"useNmapfromMN","no",,
"enableASMI","no",,
"db2installloc","/mntdb2",,
"databaseloc","/var/lib",,
"sshbetweennodes","ALLGROUPS",,
"dnshandler","ddns",,
"vsftp","n",,
"cleanupxcatpost","no",,

networks
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,nodehostname,ddnsdomain,vlanid,domain,comments,disable
"private","10.0.0.0","255.255.254.0

[xcat-user] External NFS for Linux Stateless nodes

2012-08-07 Thread Russell Jones
Hi all,

Does xCAT provide support for Linux diskless nodes to download their 
rootimg via external NFS? I have found the following page that talks 
about support for AIX stateless and external NFS, but nothing for Linux: 
http://sourceforge.net/apps/mediawiki/xcat/index.php?title=NFS_redundancy



Thanks in advance!

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] xCAT syncfiles not working on boot

2012-08-08 Thread Russell Jones

xCAT 2.6.8, diskless centos nodes.

I have an odd issue where the syncfiles postscript is running, but not 
actually syncing the files like it should on node boot. The node itself 
records the following error:


xCAT:  ./syncfiles: the OS name = Linux
xCAT:  ./syncfiles: Perform Syncing File action encountered error


On the compute node:

 * /var/log/xcat/xcat.log does not record any errors after "Running
   postscript: syncfiles"



On the management node:

 * updatenode nod546 -F  says "File synchronization complete", but does
   not actual sync the files.
 * xdcp nod546 -F /path/to/synclist-file *does* sync the file.




How can I figure out what error it is actually running into? Why would 
xdcp work, but updatenode does not?



Thanks!


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] [SOLVED] xCAT syncfiles not working on boot

2012-08-08 Thread Russell Jones

Hi all,

Just wanted to update. Turns out these nodes provmethod was set 
specifically to an osimage as opposed to just "netboot". As a result, I 
needed to set the "synclists" attribute.


I still cannot catch any errors from xCAT saying it did not see a 
"synclists" attribute like it would expect. It would be nice if a future 
improvement were made to feed this information back to the user :-)



On 8/8/2012 12:48 PM, Russell Jones wrote:

xCAT 2.6.8, diskless centos nodes.

I have an odd issue where the syncfiles postscript is running, but not 
actually syncing the files like it should on node boot. The node 
itself records the following error:


xCAT:  ./syncfiles: the OS name = Linux
xCAT:  ./syncfiles: Perform Syncing File action encountered error


On the compute node:

  * /var/log/xcat/xcat.log does not record any errors after "Running
postscript: syncfiles"



On the management node:

  * updatenode nod546 -F  says "File synchronization complete", but
does not actual sync the files.
  * xdcp nod546 -F /path/to/synclist-file *does* sync the file.




How can I figure out what error it is actually running into? Why would 
xdcp work, but updatenode does not?



Thanks!




--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] how can I download xcat

2012-08-09 Thread Russell Jones
I'm not seeing a reliable source for download outside of sourceforge. 
While you may be able to find the necessary packages, you'll still be 
without the documentation that is crucial for fully understanding how to 
properly administer your cluster with xCAT.


I would recommend renting yourself a proxy outside of China that you can 
use to be able to access blocked sites.





On 8/9/2012 12:21 AM, m13601078155 wrote:
I can not open the web site http://xcat.sourceforge.net/  because 
sourceforge can not access from P.R. China.

so, how can I do?
2012-08-09

m13601078155


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] [SOLVED] xCAT syncfiles not working on boot

2012-08-09 Thread Russell Jones

  
  
Agreed. Let me clarify... I meant if you try to force a syncfiles
update (updatenode $node -F), it still says "completed" instead of
giving a warning that no syncfile list is defined. It would be more
user-friendly (imho) to at least let the user know that their
attempts to sync files is futile unless they define a list.  :-) 


On 8/9/2012 9:27 AM, Lissa Valletta
  wrote:


  The problem with putting out a
  message is the synclist is totally optional.  You do not have
  to have a synclist.   But since we put syncfiles in the
  defaults,  not having it just means, it does nothing. 

  Lissa K. Valletta
  2-3/T12
  Poughkeepsie, NY 12601
  (tie 293) 433-3102
  

    
    Russell Jones
  ---08/08/2012 02:58:21 PM---Hi all, Just wanted to update.
  Turns out these nodes provmethod was set

From: Russell Jones
  
To: xcat-user@lists.sourceforge.net
Date: 08/08/2012 02:58 PM
Subject: Re: [xcat-user] [SOLVED] xCAT
  syncfiles not working on boot
  
  
  
  
  Hi all,

Just wanted to update. Turns out these nodes provmethod was set
specifically to an osimage as opposed to just "netboot". As a
result, I needed to set the "synclists" attribute. 

I still cannot catch any errors from xCAT saying it did not see
a "synclists" attribute like it would expect. It would be nice
if a future improvement were made to feed this information back
to the user :-) 

      
  On 8/8/2012 12:48 PM, Russell Jones
wrote:
  
xCAT 2.6.8, diskless centos nodes.
  
  I have an odd issue where the syncfiles postscript is running,
  but not actually syncing the files like it should on node
  boot. The node itself records the following error:
  
  xCAT:  ./syncfiles: the OS name = Linux
  xCAT:  ./syncfiles: Perform Syncing File action encountered
  error
  
  
  On the compute node:

  /var/log/xcat/xcat.log does
  not record any errors after "Running postscript:
  syncfiles"


  
  On the management node:

  updatenode nod546 -F  says
  "File synchronization complete", but does not actual sync
  the files. 
  
  xdcp nod546 -F
  /path/to/synclist-file *does* sync the file.


  
  
  How can I figure out what error it is actually running into?
  Why would xdcp work, but updatenode does not? 
  
  
  Thanks!
  
  
  

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's
security and 
threat landscape has changed and how IT managers can
respond. Discussions 
will include endpoint security, mobile security and the
latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


___
xCAT-user mailing list
  xCAT-user@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/xcat-user
  
  
  --
  Live Security Virtual Conference
  Exclusive live event will cover all the ways today's security
  and 
  threat landscape has changed and how IT managers can respond.
  Discussions 
  will include endpoint security, mobile security and the latest
  in malware 
  threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
  xCAT-user mailing list
  xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

  
  
  
  --
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
  
  
  
  ___
xCAT-user mailing list
xCAT-user@l

Re: [xcat-user] [SOLVED] xCAT syncfiles not working on boot

2012-08-09 Thread Russell Jones

  
  
Opened. Thanks!

On 8/9/2012 10:26 AM, Lissa Valletta
  wrote:


  Open a SF defect.  I agree.  


  Lissa K. Valletta
  2-3/T12
  Poughkeepsie, NY 12601
  (tie 293) 433-3102
  


Russell Jones
  ---08/09/2012 10:41:33 AM---Agreed. Let me clarify... I meant
  if you try to force a syncfiles update  (updatenode $node -F),
  it

From: Russell Jones
  
To: xcat-user@lists.sourceforge.net
Date: 08/09/2012 10:41 AM
Subject: Re: [xcat-user] [SOLVED] xCAT
  syncfiles not working on boot
  
  
  
  
  Agreed. Let me clarify... I meant if
you try to force a syncfiles update (updatenode $node -F), it
still says "completed" instead of giving a warning that no
syncfile list is defined. It would be more user-friendly (imho)
to at least let the user know that their attempts to sync files
is futile unless they define a list. :-) 

  
  On 8/9/2012 9:27 AM, Lissa Valletta
wrote:
  

The problem with putting out a
  message is the synclist is totally optional.  You do not have
  to have a synclist.   But since we put syncfiles in the
  defaults,  not having it just means, it does nothing. 
  
  Lissa K. Valletta
  2-3/T12
  Poughkeepsie, NY 12601
  (tie 293) 433-3102

  
  
    Russell Jones ---08/08/2012
  02:58:21 PM---Hi all, Just wanted to update. Turns out these
  nodes provmethod was set

  From: Russell Jones 
  To: xcat-user@lists.sourceforge.net
  Date: 08/08/2012
  02:58 PM
  Subject: Re:
  [xcat-user] [SOLVED] xCAT syncfiles not working on boot

  
  
  Hi all,
  
  Just wanted to update. Turns out these nodes provmethod was
  set specifically to an osimage as opposed to just "netboot".
  As a result, I needed to set the "synclists" attribute. 
  
  I still cannot catch any errors from xCAT saying it did not
  see a "synclists" attribute like it would expect. It would be
  nice if a future improvement were made to feed this
  information back to the user :-) 
  
      
  On 8/8/2012 12:48 PM, Russell Jones wrote: 

  xCAT 2.6.8, diskless centos nodes.

I have an odd issue where the syncfiles postscript is
running, but not actually syncing the files like it should
on node boot. The node itself records the following error:

xCAT:  ./syncfiles: the OS name = Linux
xCAT:  ./syncfiles: Perform Syncing File action encountered
error


On the compute node: 
  
/var/log/xcat/xcat.log does
not record any errors after "Running postscript:
syncfiles"
  
  

On the management node: 
  
updatenode nod546 -F  says
"File synchronization complete", but does not actual
sync the files. 

xdcp nod546 -F
/path/to/synclist-file *does* sync the file.
  
  


How can I figure out what error it is actually running into?
Why would xdcp work, but updatenode does not? 


Thanks!



  
--
  Live Security Virtual Conference
  Exclusive live event will cover all the ways today's
  security and 
  threat landscape has changed and how IT managers can
  respond. Discussions 
  will include endpoint security, mobile security and the
  latest in malware 
  threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

  
  ___
  xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's
security and 
threat landscape has changed and how IT managers can
respond. Discussions 
will include endpoint security, mobile security and the
   

Re: [xcat-user] PXE boot and EL 6.2

2012-08-09 Thread Russell Jones
What is the actual error you are getting with the node that is not 
allowing it to boot?



On 8/9/2012 9:33 AM, Mark Loveridge wrote:
> Hi,
>
> I'm just starting to look at using CentOS 6.x on my xCAT server.
>
> I have two xCAT servers, one running CentOS 5.5 and the other CentOS 6.2 both 
> have xCAT 2.7.3 installed and RPMs from the latest dependencies tarball,
>
> I'm attempting to install stateful HS21 nodes with CentOS 5.5.  Historically 
> I've always used PXE to netboot to install these (a hangover from xCAT1 
> days...).  However, with a 6.2 server this does not work.  The system appears 
> to install correctly but will not boot from disk afterwards - it appears that 
> the bootloader  is not getting configured correctly. Using XNBA the systems 
> will install (and reboot) correctly.
>
> With the CentOS 5.5 server both PXE and XNBA work without issues.
>
> I'm aware that XNBA is the preferred netboot method (and it's been on my TODO 
> list for a long time to check it out), but is PXE expected to still work with 
> CentOS 6.2 servers?  Is support for EL 6.2 complete yet in 2.7.3?
>
> One other difference between the two servers is that the 5.5 system is an 
> upgrade from 2.6.8; the 6.2 system is a fresh install of 2.7.3.  This makes a 
> huge difference to the list of installed RPMs on the two systems as none of 
> the nbroot/nbkernel RPMs are on the new installation. Could this have an 
> effect on the PXE netboot?
>
> Regards,
>
> Mark
>
>
> 
> Mark Loveridge ma...@gatwick.westerngeco.slb.com
>
> Tel: +44(0)1293 556870 --- Fax: +44(0)1293 556800 --- Mob: +44(0)7824 473477
>
> Registered Name:   WesternGeco Limited
> Registered Office: Schlumberger House, Buckingham Gate, Gatwick Airport,
> West Sussex RH6 0NZ, UK
> Registered in England & Wales: No. 1389716
>
>
>
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] Time issues with some stateless IBM nodes

2012-08-09 Thread Russell Jones
Hi all,

I am having an issue with time on some CentOS 5.2 dx360 M2 nodes. 
Whenever the node boots up, it's system time is consistently 5 hours, to 
the second, behind the hardware clock. In addition, it states at the top 
during bootup "Could not access the hardware clock via any known 
methods", and then right after that sets the system time to be exactly 5 
hours behind the hardware clock.

Details:

* BIOS clock is set to local time properly, and maintains its time 
between reboots / power cycles.

* /etc/localtime is set to Chicago properly

* /etc/sysconfig/clock contains the following:
TIMEZONE="America/Chicago"
UTC=false
ARC=false

* /etc/adjtime contains the following:
0.000
0
LOCAL



I've ran out of ideas on what is getting left behind that could be 
confusing the system clock. The site table's timezone is set to 
"America/Chicago" as well. I am unsure if this is a CentOS or xCAT 
configuration issue. Any brainstorming ideas on what I could try would 
be greatly appreciated.

Thanks!

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Time issues with some stateless IBM nodes

2012-08-09 Thread Russell Jones
 

On 09.08.2012 16:51, Russell Jones wrote: 

> Hi all,
> 
> I am
having an issue with time on some CentOS 5.2 dx360 M2 nodes. 
> Whenever
the node boots up, it's system time is consistently 5 hours, to 
> the
second, behind the hardware clock. In addition, it states at the top 
>
during bootup "Could not access the hardware clock via any known 
>
methods", and then right after that sets the system time to be exactly 5

> hours behind the hardware clock.
> 
> Details:
> 
> * BIOS clock is
set to local time properly, and maintains its time 
> between reboots /
power cycles.
> 
> * /etc/localtime is set to Chicago properly
> 
> *
/etc/sysconfig/clock contains the following:
>
TIMEZONE="America/Chicago"
> UTC=false
> ARC=false
> 
> * /etc/adjtime
contains the following:
> 0.0 0 0
> 0
> LOCAL
> 
> I've ran out of ideas
on what is getting left behind that could be 
> confusing the system
clock. The site table's timezone is set to 
> "America/Chicago" as well.
I am unsure if this is a CentOS or xCAT 
> configuration issue. Any
brainstorming ideas on what I could try would 
> be greatly
appreciated.
> 
> Thanks!

Thought I would add in what the output looks
like when this issue occurs:

At boot:

SELinux: Disabled at
runtime.
type=1404 audit(1344531308.511:2): selinux=0 auid=4294967295
ses=4294967295
INIT: version 2.86 booting
 Welcome to CentOS release 5.3
(Final)
 Press 'I' to enter interactive startup.
Cannot access the
Hardware Clock via any known method.
Use the --debug option to see the
details of our search for an access method.
Setting clock : Thu Aug 9
11:55:08 CDT 2012 [ OK ]

In the OS:

[root@system ~]# date
Thu Aug 9
11:56:05 CDT 2012

root@system ~]# hwclock
Thu 09 Aug 2012 04:56:08 PM
CDT -0.333130 seconds

[root@system ~]# hwclock --debug
hwclock from
util-linux-2.13-pre7
Using /dev/rtc interface to clock.
Last drift
adjustment done at 0 seconds after 1969
Last calibration done at 0
seconds after 1969
Hardware clock is on unknown time
Assuming hardware
clock is kept in local time.
Waiting for clock tick...
/dev/rtc does not
have interrupt functions. Waiting in loop for time from /dev/rtc to
change
...got clock tick
Time read from Hardware Clock: 2012/08/09
16:56:14
Hw clock time : 2012/08/09 16:56:14 = 1344549374 seconds since
1969
Thu 09 Aug 2012 04:56:14 PM CDT -0.756991 seconds

 --
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Time issues with some stateless IBM nodes

2012-08-10 Thread Russell Jones


On 8/10/2012 5:29 AM, Lissa Valletta wrote:


I have no idea why it comes up so far off,  but do you have setupntp 
in your postscripts.  At least that should help realign it over time.

It does assume you have setup the MN as a ntp server.

Lissa K. Valletta
2-3/T12
Poughkeepsie, NY 12601
(tie 293) 433-3102



Thanks Lisa. The postscript itself is not being used, but NTP is 
configured on the nodes. It does eventually bring them in line with the 
hardware clock, but unfortunately still doesn't solve the initial 
problem of it being off.


I am trying to figure out what is causing this without having to resort 
to a hackish way of fixing it :)Someone else on a different mailing 
list mentioned that the "Could not access the hardware clock via any 
known methods" error is a hint that the initrd is missing the needed 
modules to properly access the hardware clock.  I am going to dive down 
that road on Tuesday and see what I'm able to dig up.


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Time issues with some stateless IBM nodes

2012-08-10 Thread Russell Jones


On 8/10/2012 9:37 AM, James Richardson wrote:


It may be easier to also just add a call to 'ntpdate' in your setupntp 
script to initially set the date without having to wait for ntpd to 
align itself over time if you're starting from a big time delta.


James




Indeed, that's my last resort. Either do that or sync the hwclock back 
to the sysclock. Was hoping to get the root cause of the problem fixed 
though instead of having to put a "workaround" in place.


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Time issues with some stateless IBM nodes

2012-08-13 Thread Russell Jones
Thanks Chris,

I was actually one "dot version" off, it's 5.3, not 5.2. However the OS 
cannot be upgraded right now as a slew of software has been verified to 
work on this version. If it comes down to it and I am unable to get it 
to work correctly, a work-around via "ntpdate" or "hwclock --hwtosys" 
would be a better option than changing the OS version.

Vulnerabilities are not an issue, this cluster cannot be accessed from 
the outside.




On 8/13/2012 12:46 AM, Christopher Samuel wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 10/08/12 07:51, Russell Jones wrote:
>
>> I am having an issue with time on some CentOS 5.2 dx360 M2 nodes.
>> Whenever the node boots up, it's system time is consistently 5
>> hours, to the second, behind the hardware clock. In addition, it
>> states at the top during bootup "Could not access the hardware
>> clock via any known methods", and then right after that sets the
>> system time to be exactly 5 hours behind the hardware clock.
> Are you able to test with a newer version of CentOS?  5.2 is pretty
> ancient (not to mention vulnerable) and you might find some of the
> kernel fixes since then allow access to the clocks on that HW.
>
> cheers,
> Chris
> - -- 
>   Christopher SamuelSenior Systems Administrator
>   VLSCI - Victorian Life Sciences Computation Initiative
>   Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>   http://www.vlsci.org.au/  http://twitter.com/vlsci
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAlAolLQACgkQO2KABBYQAh8tNQCfdsfxGNqmffYsXdjMc7hJCctP
> n8AAnRfar4/AwJRsEaxvI+ErzMrDdPbe
> =EGWo
> -END PGP SIGNATURE-
>
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Time issues with some stateless IBM nodes

2012-08-13 Thread Russell Jones


On 8/10/2012 12:09 PM, Sten Wolf wrote:
Supposedly that's what the /etc/ntp/step-tickers file is for. Just add 
it to the synclist or to the setupntp postscript (I just add a simple  
echo "$i" >>$step_tickers_file' and an 'echo $master ...' in the for 
loop after setting the step_tickers_file to the above file).



Thanks Sten. That does indeed look like a valid "fix" for this issue. I 
will more than likely use this if trying to add the proper modules into 
the initrd to have it read the hardware clock correctly proves to be a 
chore :-)
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Error: Install image not found in /install/centos6.2/x86_64

2012-08-13 Thread Russell Jones

On 8/13/2012 4:41 PM, Matthias Marx wrote:
> Hi,
>
> I'm currently trying to use xCAT to install a small cluster of four
> systems. Those systems are controlled via IPMI interfaces.

Hi Matthias,

The CentOS minimal iso is unsuitable for automated deployments via xCAT 
due to it not including the required pxeboot directory content 
(./images/pxeboot/vmlinuz and ./images/pxeboot/initrd.img).

Please try again with the standard DVD iso.

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] linuximage table behavior

2012-08-15 Thread Russell Jones
Hi all,

I am poking around at xCAT 2.7, and I have noticed that the linuximage 
table updates itself as each genimage is ran. For example, for my 
"centos6.2-x86_64-netboot-compute" image the netdrivers section updates 
with the new list of drivers I provide on each image generation.

However I have noticed that if I re-gen that image without any drivers 
provided at all in the list (genimage -i eth0 -o centos6.2 -p compute), 
instead of clearing out the netdrivers it keeps the last entry that 
contained network drivers.

Is this the expected behavior?

Thanks!

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] [SOLVED] Time issues with some stateless IBM nodes

2012-08-15 Thread Russell Jones

After poking around a bit I ended up just going the step-tickers route :-)


On 8/13/2012 9:33 PM, Russell Jones wrote:


On 8/10/2012 12:09 PM, Sten Wolf wrote:
Supposedly that's what the /etc/ntp/step-tickers file is for. Just 
add it to the synclist or to the setupntp postscript (I just add a 
simple  echo "$i" >>$step_tickers_file' and an 'echo $master ...' in 
the for loop after setting the step_tickers_file to the above file).



Thanks Sten. That does indeed look like a valid "fix" for this issue. 
I will more than likely use this if trying to add the proper modules 
into the initrd to have it read the hardware clock correctly proves to 
be a chore :-)



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] grep /opt/xcat/xcatinfo: No such file or directory

2012-08-30 Thread Russell Jones
What is different with 04 and 05? *Anything* at all? Different subnets?

Double check the basics, especially that forward and reverse DNS works 
properly from both node and cmgmt side.


On 8/30/2012 11:43 AM, Ragnar Skulason wrote:
> Hi everyone,
>
> I'm having some problems with xCat with some of our nodes.
> We have 5 storage nodes with the same diskless xCat configuration, named s301 
> - s305.
>
> When I boot s301, s302 and s303 everything works as expected. But with s304 
> and s305 I get the following error on the host console:
> "grep /opt/xcat/xcatinfo: No such file or directory"
> and everything hangs for few minutes.
>
> I've figured this message comes from
> /install/postscripts/xcatdsklspost
> and after some debugging "echo"s Ive narrowed the problem to
> /xcatpost/getpostscript.awk
>
> which injects some xCat variables to s303:/tmp/mypostscript
> but with s304 and s305 I get nothing
>
> getpostscript.awk seems to interact with some local server:
> server = "/inet/tcp/0/127.0.0.1/400"
> again s303 gets xml from this service with some parameters,
> but s305 gets this empty xml response:
> 
>
> 
>
> Would anyone know the cause of this or give me idea what server this is or 
> where I could find it to debug it further?
>   
> Best regards and thanks
> Ragnar Skulason
>
>
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] Stateless image of different distro

2012-10-04 Thread Russell Jones
Hi all,

Does xCAT support generating stateless images of other distros? I am 
attempting to genimage a SLES 11 compute profile on a CentOS 5 box, and 
of course it fails trying to find Zypper, /etc/SuSE-release, etc.

Is there a trick to it, or is it not officially supported?

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] How to configure secondary domain nameserver in xCAT version 2.7.3

2012-10-10 Thread Russell Jones
I could be wrong, but I imagine it would be the same as setting up any 
other Bind slave. xCAT doesn't do anything special with the Bind 
configuration. It is just a standard Bind master with normal records.



On 10/8/2012 10:22 AM, Mike J. Denny wrote:

Hi,
Has anyone created a secondary dns nameserver? If so, what is the 
process?
I don't want a caching forwarder as it makes all dns requests back to 
the management node. I
am looking for an alternate in case the management node goes down. It 
seems that this

feature used to be available in earlier versions of xCAT.

Regards,

Mike Denny  RHCETechnology Specialist
CATERPILLAR Inc. Global Information Services
High Performance Computing Center of Expertise
Email: denn...@cat.com
Ph: (309) 675-9289FAX: (309) 675-3430
"You can check out any time you like,
  but you can never leave" - Eagles

This e-mail, including attachments, may include confidential and/or 
proprietary information,
and may be used only by the person or entity to which it is addressed. 
If the reader of this
e-mail is not the intended recipient or his or her authorized agent, 
the reader is hereby notified
that any dissemination, distribution, or copying of this e-mail is 
prohibited. If you have received
this e-mail in error, please notify the sender by replying to this 
message and delete this e-mail

immediately.


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Unable to boot from HD after auto-discovery.

2012-10-10 Thread Russell Jones

Just a quick couple of things that have bit me on this same issue:

* DNS is set incorrectly. Check to make sure /etc/resolv.conf on the 
node has the correct data the entire time through the install process. 
Make sure you don't have a postscript overwriting it with an invalid 
configuration.


* Network configuration is getting monkeyed with during postscripts. 
Make sure that eth0 is the only interface that is up and active during 
the install process. Make sure eth1 isn't accidentally being brought up 
and obtaining an IP on the same network as eth0. This could cause the 
node to attempt to contact the MN using the wrong interface.




On 10/10/2012 9:13 AM, Jarrod B Johnson wrote:
Now the issue of endless install, that would be a failure of the OS to 
update the management server.  May need log output to suggest why 
updateflag would be failing...


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] makedhcp -d does not work

2012-10-19 Thread Russell Jones
What does the lease look like in the leases file? Anything at the bottom 
the leases file for the node you are trying to remove?



On 10/19/2012 1:04 PM, Wiegers, Bert wrote:


Hello Gilad,

configure with makedhcp

stop dhcpd,

rm the lease file (/var/lib/dhcp/db/dhcpd.lease I guess  - but the 
service has to be down!)


restart dhcpd

This should help.

Best regards,

Bert Wiegers

*From:*Gilad Berman [mailto:gil...@il.ibm.com]
*Sent:* Friday, October 19, 2012 7:14 PM
*To:* xCAT Users Mailing list
*Subject:* [xcat-user] makedhcp -d does not work

Hello,

for some reason the makedhcp -d  simply do not work for us. 
it simply do not erase all the node entry (one even one sometime) from 
the dhcp.leases file.
xCAT 2.7.4, other than that i don't really know what other info is 
needed, simply not working right.


thx!

Regards,

Gilad Berman
HPC Architect
IBM System & Technology Group. Israel

E-mail: gil...@il.ibm.com 
Tel:972-3-9188262
Mobile: 972-52-2554262

The information contained in this email is being provided by IBM as a 
matter of courtesy and provided "AS-IS" without any direct and implied 
warranty; IBM assumes no liability. It is your responsibility to 
ensure that any resulting customer proposal has been correctly 
designed to meet your clients' requirements and to have an active 
review process which ensures an appropriate level of solution 
assurance is performed for all proposals. IBM does not take 
responsibility for the solution or solution assurance.




--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] makedhcp -d does not work

2012-10-22 Thread Russell Jones
When you delete a node's lease using makedhcp it should get an 
additional lease created for the node that has the word "deleted" in it. 
This will accomplish the same thing as restarted dhcp and having the 
lease completely cleared out from dhcpd.leases.




On 10/22/2012 3:28 PM, Gilad Berman wrote:

Russell,

I think i know what you meant, and yes, probably a result of too many 
removing and adding nodes without restarting dhcpd.


Bert,
we took your advise and simply started all over again, now we are much 
more carful and restarting the dhcpd more often and it seems to solve 
our issues.


thx a lot!

Regards,

Gilad Berman
HPC Architect
IBM System & Technology Group. Israel

E-mail: gil...@il.ibm.com
Tel:972-3-9188262
Mobile: 972-52-2554262

The information contained in this email is being provided by IBM as a 
matter of courtesy and provided "AS-IS" without any direct and implied 
warranty; IBM assumes no liability. It is your responsibility to 
ensure that any resulting customer proposal has been correctly 
designed to meet your clients' requirements and to have an active 
review process which ensures an appropriate level of solution 
assurance is performed for all proposals. IBM does not take 
responsibility for the solution or solution assurance.




From: Russell Jones 
To: xCAT Users Mailing list 
Date: 19/10/2012 21:02
Subject: Re: [xcat-user] makedhcp -d does not work




What does the lease look like in the leases file? Anything at the 
bottom the leases file for the node you are trying to remove?



On 10/19/2012 1:04 PM, Wiegers, Bert wrote:
Hello Gilad,

configure with makedhcp
stop dhcpd,
rm the lease file (/var/lib/dhcp/db/dhcpd.lease I guess  - but the 
service has to be down!)

restart dhcpd
This should help.

Best regards,
Bert Wiegers


*From:* Gilad Berman [_mailto:gil...@il.ibm.com_] *
Sent:* Friday, October 19, 2012 7:14 PM*
To:* xCAT Users Mailing list*
Subject:* [xcat-user] makedhcp -d does not work

Hello,

for some reason the makedhcp -d  simply do not work for us. 
it simply do not erase all the node entry (one even one sometime) from 
the dhcp.leases file.
xCAT 2.7.4, other than that i don't really know what other info is 
needed, simply not working right.


thx!

Regards,

Gilad Berman
HPC Architect
IBM System & Technology Group. Israel

E-mail: _gil...@il.ibm.com_ <mailto:gil...@il.ibm.com>
Tel:972-3-9188262
Mobile: 972-52-2554262

The information contained in this email is being provided by IBM as a 
matter of courtesy and provided "AS-IS" without any direct and implied 
warranty; IBM assumes no liability. It is your responsibility to 
ensure that any resulting customer proposal has been correctly 
designed to meet your clients' requirements and to have an active 
review process which ensures an appropriate level of solution 
assurance is performed for all proposals. IBM does not take 
responsibility for the solution or solution assurance.




--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
_http://p.sf.net/sfu/appdyn_sfd2d_oct_


___
xCAT-user mailing list
_xCAT-user@lists.sourceforge.net_ <mailto:xCAT-user@lists.sourceforge.net>
_https://lists.sourceforge.net/lists/listinfo/xcat-user_

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] performing a diskfull sles install with an updated kernel

2012-10-25 Thread Russell Jones
What is the issue you are trying to resolve that does not allow for 
adding the updated kernel into otherpkgs and doing the install via 
postbootscripts?


On 10/25/2012 9:46 PM, François Bissey wrote:
> On Fri, 26 Oct 2012 15:13:03 François Bissey wrote:
>> On Fri, 26 Oct 2012 10:09:11 Xiao Peng Wang wrote:
>>> Also it should work that move the running of otherpkgs from
>>> postbootscripts
>>> to postscripts so that the install of the update kernel before the reboot.
>> If that works that is quite possibly an easier solution. I will try this
>> first.
> Didn't work. On power7 with sles it must be sometimes after the reboot.
>
> Francois
>
> --
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] performing a diskfull sles install with an updated kernel

2012-10-26 Thread Russell Jones
Have you considered just adding the update kernel packages into the main 
OS repo? You can then specify in the .pkglist file to use the new kernel 
versions during install (although it should use the newer ones by default).



On 10/26/2012 3:19 AM, Francois Bissey wrote:
> I don't want to have to reboot a second time for it to take effect.
>
> On 26/10/12 18:21, Russell Jones wrote:
>> What is the issue you are trying to resolve that does not allow for
>> adding the updated kernel into otherpkgs and doing the install via
>> postbootscripts?
>>
>>
>> On 10/25/2012 9:46 PM, François Bissey wrote:
>>> On Fri, 26 Oct 2012 15:13:03 François Bissey wrote:
>>>> On Fri, 26 Oct 2012 10:09:11 Xiao Peng Wang wrote:
>>>>> Also it should work that move the running of otherpkgs from
>>>>> postbootscripts
>>>>> to postscripts so that the install of the update kernel before the reboot.
>>>> If that works that is quite possibly an easier solution. I will try this
>>>> first.
>>> Didn't work. On power7 with sles it must be sometimes after the reboot.
>>>
>>> Francois
>>>
>>> --
>>> Everyone hates slow websites. So do we.
>>> Make your web apps faster with AppDynamics
>>> Download AppDynamics Lite for free today:
>>> http://p.sf.net/sfu/appdyn_sfd2d_oct
>>> ___
>>> xCAT-user mailing list
>>> xCAT-user@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>>
>>
>> --
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_sfd2d_oct
>> ___
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
>
> --
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] performing a diskfull sles install with an updated kernel

2012-10-28 Thread Russell Jones

  
  
Updating the repo is fairly simple (at least for redhat flavors).
You just need to re-run "createrepo ." from the repo directory. Be
sure to pass it the group file when running createrepo.

Your mileage may vary with SLES, but I believe off the top of my
head it is the same steps.


On 10/28/2012 10:10 PM, Guang Cheng Li
  wrote:


  If we simply put the updated
  kernel RPM into the os packages directory, I do not think it
  will work. The os repo information needs to be updated too,
  updating the os repo info is not so easy, we have ever tried
  to put some updated RPMs into the os repo, but the os
  installation will not recognize the updated RPMs. 

Thanks,
-
  Li,Guang Cheng (李光成)
  IBM China System Technology Laboratory
  Email: ligua...@cn.ibm.com
  Address: Building 28, ZhongGuanCun Software Park,
           No.8, Dong Bei Wang West Road, Haidian District
  Beijing 100193, PRC
  
  北京市海淀区东北旺西路8号中关村软件园28号楼
  邮编: 100193
    
    Russell Jones
  ---2012-10-26 22:36:54---Have you considered just adding the
  update kernel packages into the main  OS repo? You can then
  spec

From: Russell Jones
  
To: xcat-user@lists.sourceforge.net
Date: 2012-10-26 22:36
Subject: Re: [xcat-user] performing a
  diskfull sles install with an updated kernel
  
  
  
  
  Have you considered just adding the update
  kernel packages into the main 
  OS repo? You can then specify in the .pkglist file to use the
  new kernel 
  versions during install (although it should use the newer ones
  by default).
  
  
  
  On 10/26/2012 3:19 AM, Francois Bissey wrote:
  > I don't want to have to reboot a second time for it to
  take effect.
  >
  > On 26/10/12 18:21, Russell Jones wrote:
  >> What is the issue you are trying to resolve that does
  not allow for
  >> adding the updated kernel into otherpkgs and doing
  the install via
  >> postbootscripts?
  >>
  >>
  >> On 10/25/2012 9:46 PM, François Bissey wrote:
  >>> On Fri, 26 Oct 2012 15:13:03 François Bissey
  wrote:
  >>>> On Fri, 26 Oct 2012 10:09:11 Xiao Peng Wang
  wrote:
  >>>>> Also it should work that move the running
  of otherpkgs from
  >>>>> postbootscripts
  >>>>> to postscripts so that the install of the
  update kernel before the reboot.
  >>>> If that works that is quite possibly an
  easier solution. I will try this
  >>>> first.
  >>> Didn't work. On power7 with sles it must be
  sometimes after the reboot.
  >>>
  >>> Francois
  >>>
  >>>
--
  >>> Everyone hates slow websites. So do we.
  >>> Make your web apps faster with AppDynamics
  >>> Download AppDynamics Lite for free today:
  >>> http://p.sf.net/sfu/appdyn_sfd2d_oct
  >>> ___
  >>> xCAT-user mailing list
  >>> xCAT-user@lists.sourceforge.net
  >>> https://lists.sourceforge.net/lists/listinfo/xcat-user
  >>>
  >>
  >>
--
  >> Everyone hates slow websites. So do we.
  >> Make your web apps faster with AppDynamics
  >> Download AppDynamics Lite for free today:
  >> http://p.sf.net/sfu/appdyn_sfd2d_oct
  >> ___
  >> xCAT-user mailing list
  >> xCAT-user@lists.sourceforge.net
  >> https://lists.sourceforge.net/lists/listinfo/xcat-user
  >>
  >
  >
--
  > Everyone hates slow websites. So do we.
  > Make your web apps faster with AppDynamics
  > Download AppDynamics Lite for free today:
  > http://p.sf.net/sfu/appdyn_sfd2d_oct
  > ___
  > xCAT-user mailing list
  > xCAT-user@lists.sour

Re: [xcat-user] performing a diskfull sles install with an updated kernel

2012-10-28 Thread Russell Jones
That is not irrelevant information. Guang is telling you that you can 
find the createrepo RPM on the SDK iso so that you can install it on the 
mgmt node (if it is not already) in order to re-run createpo. That is, 
if you decide that adding the updated kernel RPMs into the main 
repository under /install is the best option for you.

There is nothing "wrong" with altering an OS repository if it will get 
you what you need. There is nothing "magical" about that repo - it's 
just a repo :-)

"This is my repo. There are many like it, but this one is mine."



On 10/28/2012 11:52 PM, Francois Bissey wrote:
> On 29/10/12 17:37, Guang Cheng Li wrote:
>> BTW, SLES createrepo package is in the SLES SDK ISO.
>>
> Technically irrelevant has it only needs to be installed on the xcat
> server and i am worried about deployment on nodes.
> I consider altering the repo that comes from the media to be a
> doubtful practise but it may just be my misguided opinion.
>
> Francois
>
>
> --
> The Windows 8 Center - In partnership with Sourceforge
> Your idea - your app - 30 days.
> Get started!
> http://windows8center.sourceforge.net/
> what-html-developers-need-to-know-about-coding-windows-8-metro-style-apps/
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>


--
The Windows 8 Center - In partnership with Sourceforge
Your idea - your app - 30 days.
Get started!
http://windows8center.sourceforge.net/
what-html-developers-need-to-know-about-coding-windows-8-metro-style-apps/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Debian Nodes

2012-10-29 Thread Russell Jones
What's the path and name of the files debian uses for installing itself 
over the network? Make sure those exist in the 
/install/debian6.0.6/x86_64 directory.




On 10/29/2012 2:48 PM, Steven Presser wrote:

It does.

$ ls /install/debian6.0.6/x86_64/
autorun.inf  firmware isolinux README.mirrors.html  tools
css  g2ldrmd5sum.txt   README.mirrors.txt win32-loader.ini
debian   g2ldr.mbrpics README.source
distsinstall  pool README.txt
doc  install.amd  README.html  setup.exe

However, I dont have a /tftpboot/xcat/debian6.0.6 directory.

I can't find a way to get more verbose output or any debug output (for 
nodeset).  If you're able to point me in the right direction for some 
of that, I may be able to figure this out myself.


Steve

On 10/29/2012 02:56 PM, Ling Gao wrote:

Hi,
   Does the copycds command copy the os files under the directory 
/install/debian6.0.6/x86_64?


Ling

Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: ling...@us.ibm.com, 845-433-5692

"I never worry about the future. It comes soon enough." --- Albert 
Einstein




From: Steven Presser 
To: 
Date: 10/28/2012 01:52 PM
Subject: [xcat-user] Debian Nodes




Hey all,
I'm having very little luck finding directions for this online, but
I'd like to use xCat (installed on CentOS) to kickstart some of my nodes
to debian.  I notice that there is a comppute template in
/opt/xcat/share/xcat/install/debian/.  I've downloaded all the debian
stable DVDs and imported them using copycds.  However, when I go to set
a node to do a debian install I get:

# nodeset node001 install=debian6.0.6-x86_64-compute
Error: Install image not found in /install/debian6.0.6/x86_64
Error: Some nodes failed to set up install resources, aborting

I'd appreciate any assistance or a pointer towards any resources that
will help me.

Thanks!
Steve


--
WINDOWS 8 is here.
Millions of people.  Your app in 30 days.
Visit The Windows 8 Center at Sourceforge for all your go to resources.
http://windows8center.sourceforge.net/
join-generation-app-and-make-money-coding-fast/
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user




--
The Windows 8 Center - In partnership with Sourceforge
Your idea - your app - 30 days.
Get started!
http://windows8center.sourceforge.net/
what-html-developers-need-to-know-about-coding-windows-8-metro-style-apps/


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



--
The Windows 8 Center - In partnership with Sourceforge
Your idea - your app - 30 days.
Get started!
http://windows8center.sourceforge.net/
what-html-developers-need-to-know-about-coding-windows-8-metro-style-apps/


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


--
The Windows 8 Center - In partnership with Sourceforge
Your idea - your app - 30 days.
Get started!
http://windows8center.sourceforge.net/
what-html-developers-need-to-know-about-coding-windows-8-metro-style-apps/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Can a short hostname be converted to a FQDN during an install?

2012-11-20 Thread Russell Jones
> We'd like to use short hostnames as all of our keys in the xCAT 
tables, to save typing, and have the FQDN assigned as the hostname after 
an install.


This should already be the standard behavior as long as your nodeadd 
contained the short hostname and not the FQDN. What are you seeing on 
your end?




On 11/20/2012 3:22 PM, Pocina, Goran wrote:


We'd like to use short hostnames as all of our keys in the xCAT 
tables, to save typing, and have the FQDN assigned as the hostname 
after an install.


We could fix the hostname with a postscript,  or perhaps a Kickstart 
statement, however we'd prefer to have it done more transparently.


It would be great if we could use something like a regex in 
"network.nodehostname="/\z/.nyc.desres.deshaw.com/" to do this 
automatically:


[root@drdkvm0003 dump1]# lsdef -t network 149_77_52_0-255_255_254_0

Object name: 149_77_52_0-255_255_254_0

ddnsdomain=nyc.desres.deshaw.com

domain=nyc.desres.deshaw.com

dynamicrange=149.77.53.132-149.77.53.191

gateway=149.77.52.10

mask=255.255.254.0

mgtifname=!remote!

nameservers=149.77.52.53,149.77.52.35,10.232.15.130

net=149.77.52.0

* nodehostname=/\z/.nyc.desres.deshaw.com/*

ntpservers=149.77.52.24,149.77.52.41,149.77.52.138,149.77.52.129

tftpserver=149.77.53.252

But this doesn't seem to have any effect on the DHCP host-name entry 
generated in the leases file, or on the "HOSTNAME=develkv8" entry put 
in /etc/sysconfig/network after the install.


BTW. We do have the FQDN listed first in /etc/hosts and DNS:

[root@develkv8 ~]# hostname

develkv8

[root@develkv8 ~]# host 149.77.53.121

121.53.77.149.in-addr.arpa domain name pointer 
*develkv8.nyc.desres.deshaw.com.*


[root@develkv8 ~]# grep develkv8 /etc/hosts

>>>149.77.53.121 *develkv8.nyc.desres.deshaw.com* develkv8   # row 0 
rack 0 rank 0


Is there a transparent way to do this?

Thanks.



--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] 'makedns -n' without any records in zone file

2012-11-21 Thread Russell Jones

  
  
Interesting, I am actually seeing the same thing in my mini 2.7.3
test cluster. Never noticed that before since the hosts file is
getting written out on the management node properly.

I know Bind can write to /var/named fine, because if I delete the db
file and re-run makedns, it makes the db file - but it only contains
the mgmt node, not my one compute node record (even though the
output shows "Handling cn1". The hosts table contains the cn1
record, and it writes that record to /etc/hosts as expected.

Restarting Bind does not write the cn1 record into the db file. This
is on CentOS 6.



On 11/21/2012 9:13 AM, Jarrod B Johnson
  wrote:


  ls -l /var/named 

My wondering is if named user
  can create new files in the directory.  The way named works
  with the updates is it must be allowed to create .jnl files.
   The jnl files eventually get merged into the plain text
  record (e.g. restarting named would probably get it merged).

Aijun Wang
  ---11/21/2012 10:03:29 AM---Hi,all     It was successfully to
  run 'makedns -n',but there was no record in zone

From: Aijun Wang
  
To: xcat-user@lists.sourceforge.net, 
Date: 11/21/2012 10:03 AM
Subject: [xcat-user] 'makedns -n' without
  any records in zone file
  
  
  
  
  Hi,all

    It was successfully to run 'makedns -n',but there was no
record in zone file.
    What's wrong?

[root@xcatmn ~]# makedns -n 
Handling localhost in /etc/hosts.
Handling xcatmn in /etc/hosts.
Handling c02-cmm in /etc/hosts.
Handling c02n01 in /etc/hosts.
Handling c02n02 in /etc/hosts.
Handling c02n03 in /etc/hosts.
Getting reverse zones, this may take several minutes in scaling
cluster.
Completed getting reverse zones.
Updating zones.
Completed updating zones.
Starting named complete
Restarting named
Restarting named complete
Updating DNS records, this may take several minutes in scaling
cluster.
Completed updating DNS records.
named has been enabled on boot.
DNS setup is completed

[root@xcatmn named]# cat /etc/hosts
127.0.0.1        localhost
192.168.102.254        xcatmn
192.168.101.200        c02-cmm
192.168.102.201        c02n01
192.168.102.202        c02n02
192.168.102.203        c02n03

[root@xcatmn named]#  cat /etc/resolv.conf 
search test.cluster.com
nameserver 192.168.4.1

[root@xcatmn named]# cat db.test.cluster.com 
$TTL 86400
@ IN SOA xcatmn.test.cluster.com. root.xcatmn.test.cluster.com. ( 2012112100 10800 3600 604800 86400 )
  IN NS  xcatmn.test.cluster.com.
xcatmn.test.cluster.com.  IN A  192.168.102.254
[root@xcatmn named]# cat db.192.168.102 
$TTL 86400
@ IN SOA xcatmn.test.cluster.com. root.xcatmn.test.cluster.com. ( 2012112100 10800 3600 604800 86400 )
  IN NS  xcatmn.test.cluster.com.
[root@xcatmn named]# cat db.192.168.101
$TTL 86400
@ IN SOA xcatmn.test.cluster.com. root.xcatmn.test.cluster.com. ( 2012112100 10800 3600 604800 86400 )
  IN NS  xcatmn.test.cluster.com.



-- 
Engine
  --
  Monitor your physical, virtual and cloud infrastructure from a
  single
  web console. Get in-depth insight into apps, servers,
  databases, vmware,
  SAP, cloud infrastructure, etc. Download 30-day Free Trial.
  Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov___
  xCAT-user mailing list
  xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

  
  
  
  --
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
  
  
  
  ___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



  

--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth in

Re: [xcat-user] 'makedns -n' without any records in zone file

2012-11-21 Thread Russell Jones

  
  
Here's the output you asked for, as well as a bunch more:


[root@xcat-mn ~]# ls -ld /var/named
drwxrwxr-x. 5 root named 4096 Nov 21 11:33 /var/named


[root@xcat-mn ~]# ps aux | grep named
named 1903  0.0  0.6 192804 14340 ?    Ssl  11:33   0:00
/usr/sbin/named -u named


[root@xcat-mn named]# ls -al
total 48
drwxrwxr-x.  5 root  named 4096 Nov 21 11:33 .
drwxr-xr-x. 23 root  root  4096 Aug 14 15:39 ..
drwxrwx---.  2 named named 4096 Jul 30 23:09 data
-rw-rw   1 named named  151 Aug 14 16:05 db.10.0.1
-rw-rw   1 named named  151 Aug 14 16:05 db.192.168.1
-rw-rw   1 named named  196 Nov 21 09:47 db.localcluster.tld
drwxrwx---.  2 named named 4096 Jul 30 23:09 dynamic
-rw-r-.  1 root  named 1892 Feb 18  2008 named.ca
-rw-r-.  1 root  named  152 Dec 15  2009 named.empty
-rw-r-.  1 root  named  152 Jun 21  2007 named.localhost
-rw-r-.  1 root  named  168 Dec 15  2009 named.loopback
drwxrwx---.  2 named named 4096 Jul 30 23:09 slaves

[root@xcat-mn named]# cat db.localcluster.tld
$TTL 86400
@ IN SOA xcat-mn.localcluster.tld. root.xcat-mn.localcluster.tld. (
2012112100 10800 3600 604800 86400 )
  IN NS  xcat-mn.localcluster.tld.
xcat-mn.localcluster.tld.  IN A  192.168.1.1


[root@xcat-mn named]# makedns cn1
Getting reverse zones, this may take several minutes in scaling
cluster.
Completed getting reverse zones.
Updating zones.
Completed updating zones.
Updating DNS records, this may take several minutes in scaling
cluster.
Completed updating DNS records.
named has been enabled on boot.
DNS setup is completed
[root@xcat-mn named]#
[root@xcat-mn named]#
[root@xcat-mn named]# service named restart
Stopping named: .  [  OK  ]
Starting named:    [  OK  ]
[root@xcat-mn named]#
[root@xcat-mn named]#
[root@xcat-mn named]# ll
total 40
drwxrwx---. 2 named named 4096 Jul 30 23:09 data
-rw-rw  1 named named  151 Aug 14 16:05 db.10.0.1
-rw-rw  1 named named  151 Aug 14 16:05 db.192.168.1
-rw-rw  1 named named  196 Nov 21 09:47 db.localcluster.tld
drwxrwx---. 2 named named 4096 Jul 30 23:09 dynamic
-rw-r-. 1 root  named 1892 Feb 18  2008 named.ca
-rw-r-. 1 root  named  152 Dec 15  2009 named.empty
-rw-r-. 1 root  named  152 Jun 21  2007 named.localhost
-rw-r-. 1 root  named  168 Dec 15  2009 named.loopback
drwxrwx---. 2 named named 4096 Jul 30 23:09 slaves


[root@xcat-mn named]# cat db.localcluster.tld
$TTL 86400
@ IN SOA xcat-mn.localcluster.tld. root.xcat-mn.localcluster.tld. (
2012112100 10800 3600 604800 86400 )
  IN NS  xcat-mn.localcluster.tld.
xcat-mn.localcluster.tld.  IN A  192.168.1.1
[root@xcat-mn named]#



[root@xcat-mn named]# lsdef cn1
Object name: cn1
    arch=x86_64
    chain=runcmd=standby
    currchain=boot
    currstate=netboot centos6.2-x86_64-compute
    groups=compute
    initrd=xcat/netboot/centos6.2/x86_64/compute/initrd-stateless.gz
    installnic=eth0
    ip=192.168.1.10
   
kcmdline=imgurl=http://192.168.1.1//install/netboot/centos6.2/x86_64/compute/rootimg.gz
XCAT=!myipfn!:3001 ifname=eth0:08:00:27:CC:41:79 netdev=eth0
    kernel=xcat/netboot/centos6.2/x86_64/compute/kernel
    mac=08:00:27:CC:41:79
    netboot=xnba
    nfsserver=192.168.1.1
    os=centos6.2
    postscripts=syslog,remoteshell,syncfiles,hardeths
    profile="">
    provmethod=netboot
    status=booted
    statustime=11-21-2012 09:41:38
    tftpserver=192.168.1.1
[root@xcat-mn named]#
[root@xcat-mn named]#
[root@xcat-mn named]#
[root@xcat-mn named]# tabdump networks
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,nodehostname,ddnsdomain,vlanid,domain,comments,disable
"10_0_1_0-255_255_255_0","10.0.1.0","255.255.255.0","eth1","10.0.1.1",,"10.0.1.127",,
"cluster","192.168.1.0","255.255.255.0","eth0""","localcluster.tld",,"localcluster.tld",,

    


On 11/21/2012 10:55 AM, Jarrod B
  Johnson wrote:


  What is ls -ld /var/named
  output?

Russell Jones ---11/21/2012
  11:14:58 AM---Interesting, I am actually seeing the same thing
  in my mini 2.7.3 test  cluster. Never noticed that

From: Russell Jones
  
To: xcat-user@lists.sourceforge.net, 
Date: 11/21/2012 11:14 AM
Subject: Re: [xcat-user] 'makedns -n'
  without an

[xcat-user] Cannot find master for the node $node

2012-12-07 Thread Russell Jones
Hi all,

What circumstances have to be present for an xCAT 2.3 service node (old 
I know, but upgrade is not an option at this time) to write to the logs:

Dec  7 09:16:33 service03 xCAT: Cannot find master for the node c25n25


All of our service nodes have been doing this for at least over a month 
now, and we have just never noticed it before as nodes are 
netbooting/installing fine. Just curious what the logic is in the code 
that has this message being written out, just so that if it turns out to 
be something we need to track down we know where to start.

Thanks!

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Cannot find master for the node $node

2012-12-07 Thread Russell Jones

Thanks!

DNS is working properly, both forward and reverse... this seems to occur 
mostly when entire clusters are booted at the same time.


The nodes are not within the same subnet, all of these nodes are on 
different subnets from the service nodes (and routed of course). The 
other strange thing is again, it works fine even with that error:



[root@service03-hc log]# grep c25n37 messages
Dec  7 09:05:02 service03 xCAT: xCAT: Allowing getpostscript from c25n37
Dec  7 09:05:03 service03 xCAT: Cannot find master for the node c25n37
Dec  7 09:05:14 service03 xCAT: xCAT: Allowing getcredentials from c25n37
Dec  7 09:05:24 service03 xCAT: xCAT: Allowing getcredentials from c25n37
Dec  7 09:05:32 service03 xCAT: xCAT: Allowing getcredentials from c25n37


Could this message manifest itself if a service node is very busy? IE, 
not really an error, just took too long to respond to the request? Or is 
this literally sending the error because the compute nodes are not on 
the same subnet as the service nodes?




On 12/7/2012 12:18 PM, Ling Gao wrote:

Hi,
   The error is from getFacingIP function. This function takes a node 
name as an input. Then it
find out the ip of the given node. Then it call "ifconfig" on the 
local host. Then it try to see if the node ip and local host is within 
the same subnet or not. (Please see the code below). The error usually 
happens when the name resolution on the local host cannot resolve the 
given node.  Hope it helps.



#--- 



=head3   getFacingIP
   Gets the ip address of the adapter of the localhost that is 
facing the

the given node.
Arguments:
   The name of the node that is facing the localhost.
Returns:
   The ip address of the adapter that faces the node.

=cut

#--- 


sub getFacingIP
{
my ($class, $node) = @_;
my $ip;
my $cmd;
my @ipaddress;

my $nodeip = inet_ntoa(inet_aton($node));
unless ($nodeip =~ /\d+\.\d+\.\d+\.\d+/)
{
return 0;#Not supporting IPv6 here IPV6TODO
}

$cmd = "ifconfig" . " -a";
$cmd = $cmd . "| grep \"inet \"";
my @result = xCAT::Utils->runcmd($cmd, 0);
if ($::RUNCMD_RC != 0)
{
xCAT::MsgUtils->message("S", "Error from $cmd\n");
exit $::RUNCMD_RC;
}

# split node address
my ($n1, $n2, $n3, $n4) = split('\.', $nodeip);

foreach my $addr (@result)
{
my $ip;
my $mask;
if (xCAT::Utils->isLinux())
{
  my ($inet, $addr1, $Bcast, $Mask) = split(" ", $addr);
  if ((!$addr1) || (!$Mask)) { next; }
  my @ips   = split(":", $addr1);
  my @masks = split(":", $Mask);
  $ip   = $ips[1];
  $mask = $masks[1];
}
else
{  #AIX
  my ($inet, $addr1, $netmask, $mask1, $Bcast, $bcastaddr) =
split(" ", $addr);
  if ((!$addr1) && (!$mask1)) { next; }
  $ip = $addr1;
  $mask1 =~ s/0x//;
  $mask =
`printf "%d.%d.%d.%d" \$(echo "$mask1" | sed 's/../0x& /g')`;
}

if ($ip && $mask)
{

  # split interface IP
  my ($h1, $h2, $h3, $h4) = split('\.', $ip);

  # split mask
  my ($m1, $m2, $m3, $m4) = split('\.', $mask);

  # AND this interface IP with the netmask of the network
  my $a1 = ((int $h1) & (int $m1));
  my $a2 = ((int $h2) & (int $m2));
  my $a3 = ((int $h3) & (int $m3));
  my $a4 = ((int $h4) & (int $m4));

  # AND node IP with the netmask of the network
  my $b1 = ((int $n1) & (int $m1));
  my $b2 = ((int $n2) & (int $m2));
  my $b3 = ((int $n3) & (int $m3));
  my $b4 = ((int $n4) & (int $m4));

  if (($b1 == $a1) && ($b2 == $a2) && ($b3 == $a3) && ($b4 == $a4))
  {
  return $ip;
  }
}
}

xCAT::MsgUtils->message("S", "Cannot find master for the node $node\n");
return 0;
}

Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: ling...@us.ibm.com, 845-433-5692

"I never worry about the future. It comes soon enough." --- Albert 
Einstein




From: Russell Jones 
To: xCAT Users Mailing list 
Date: 12/07/2012 11:05 AM
Subject: [xcat-user] Cannot find master for the node $node




Hi all,

What circumstances have to be present for an xCAT 2.3 service node (old
I know, but upgrade is not an option at this time) to write to the logs:

Dec  7 09:16:33 service03 xCAT: Cannot find master for the node c25n25


All of our service nodes have been doing this for at least over a month
now, and we have just never noticed it before as nodes are
netbooting/installing fine. Just curious what the logic is 

Re: [xcat-user] Adding kernel parameters to stateless images.

2012-12-07 Thread Russell Jones

Brad,

You have a typo. It's "add_*k*_cmdline", not "addcmdline".

Example:

[root@xcat-mn ~]# chdef cn1 addkcmdline="rdloaddriver=scsi_dh_rdac"
1 object definitions have been created or modified.


[root@xcat-mn ~]# lsdef cn1
Object name: cn1
*addkcmdline=rdloaddriver=scsi_dh_rdac*
arch=x86_64
chain=runcmd=standby
currchain=boot
currstate=install centos6.2-x86_64-compute



[root@xcat-mn nodes]# cat cn1
#!gpxe
#install centos6.2-x86_64-compute
imgfetch -n kernel 
http://${next-server}/tftpboot/xcat/centos6.2/x86_64/vmlinuz

imgload kernel
imgargs kernel quiet repo=http://192.168.1.1/install/centos6.2/x86_64/ 
ks=http://192.168.1.1/install/autoinst/cn1 ksdevice=eth0 
*rdloaddriver=scsi_dh_rdac* BOOTIF=01-${netX/machyp}

imgfetch http://${next-server}/tftpboot/xcat/centos6.2/x86_64/initrd.img
imgexec kernel



On 12/7/2012 12:41 PM, viviano.b...@epamail.epa.gov wrote:

Ling,
Thanks, but that is a no go.  The first thing I tried was 
editing boot params.  The value in bootparams gets reset when I do a 
"nodeset storage1 netboot".


# lsdef storage1 | grep kcmdline
kcmdline=imgurl=http://!myipfn!//install/netboot/rhel6_c/x86_64/storage/rootimg.gz 
XCAT=!myipfn!:3001  console=tty0 console=ttyS0,115200 
rdloaddriver=scsi_dh_rdac

# nodeset storage1 netboot
storage1: netboot rhel6_c-x86_64-storage
# lsdef storage1 | grep kcmdline
kcmdline=imgurl=http://!myipfn!//install/netboot/rhel6_c/x86_64/storage/rootimg.gz 
XCAT=!myipfn!:3001  console=tty0 console=ttyS0,115200


Your suggestions for chdef didn't take:

# chdef storage1 addcmdline="rdloaddriver=scsi_dh_rdac"
Error: 'addcmdline' is not a valid attribute name for for an object 
type of 'node'.

Error: Skipping to the next attribute.
Error: One or more errors occured when attempting to create or modify 
xCAT

object definitions.

Here is the lsdef on the node if it helps.

# lsdef storage1
Object name: storage1
appstatus=xend=down,sshd=up,rdp=down,pbs=down,msrpc=down
appstatustime=12-07-2012 13:33:01
arch=x86_64
bmc=storage1-bmc
bmcport=0
chain=runcmd=bmcsetup,shell
currchain=shell
currstate=netboot rhel6_c-x86_64-storage
groups=all,storage
initrd=xcat/netboot/rhel6_c/x86_64/storage/initrd-stateless.gz
ip=172.20.100.9
kcmdline=imgurl=http://!myipfn!//install/netboot/rhel6_c/x86_64/storage/rootimg.gz 
XCAT=!myipfn!:3001  console=tty0 console=ttyS0,115200

kernel=xcat/netboot/rhel6_c/x86_64/storage/kernel
mac=e4:1f:13:6a:31:f4
mgt=ipmi
mtm=7947AC1
netboot=xnba
nfsserver=172.20.0.1
ondiscover=nodediscover
os=rhel6_c
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles,setupntp,addsiteyum,ipoib,mmsdrrestore
profile=storage
provmethod=netboot
serial=KQ8719X
serialport=0
serialspeed=115200
status=ping
statustime=12-07-2012 12:09:01
switch=switch0
switchport=8

Basically I'd like the insert something at the end of kcmdline.  I 
don't want to just SET "kcmdline" because other things like console 
speed and such come from other xCAT tables to build this.  It feels 
like there should be a parent table someplace that I need to put a 
value in for node "storage1".


-Brad


===
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - nair.r...@epa.gov
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - jones.durw...@epa.gov
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - paulsen.he...@epa.gov



From: Ling Gao 
To: xCAT Users Mailing list 
Cc: xcat-user@lists.sourceforge.net
Date: 12/07/2012 01:25 PM
Subject: Re: [xcat-user] Adding kernel parameters to stateless images.




Hi,
  You can try to add it to bootparams.addkcmdline by tabedit bootparams
or chdef  addkcmdline="rdloaddriver=scsi_dh_rdac"

Hope it helps,

Ling

Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: ling...@us.ibm.com, 845-433-5692

"I never worry about the future. It comes soon enough." --- Albert 
Einstein




From: viviano.b...@epamail.epa.gov
To: xcat-user@lists.sourceforge.net
Date: 12/07/2012 12:16 PM
Subject: [xcat-user] Adding kernel parameters to stateless images.




Good morning,
  I've searched Google and the xCAT WIKI and just couldn't find 
the answer.  If I missed the FAQ, sorry.  I have an iDataplex cluster 
with GPFS and xCAT 2.7.6 (yum install) on RHEL 6.3.  The GPFS nodes 
are stateless attached to a DS3500.  We're having the RDAC / FC timing 
issues detailed in this Redhat Knowledge article:


_https://access.redhat.com/knowledge/solutions/18746_

I've found a work around, which is to add "rdloaddriver=sc

Re: [xcat-user] Cannot find master for the node $node

2012-12-07 Thread Russell Jones
I am not having much luck finding an xdsh command to run that succeeds. 
I get "permission denied" on the mgmt node for any command I try to run 
as root. On the service nodes, I am trying to follow the xdsh man page 
to just run a simple uptime and am getting:


[root@service01 ~]# xdsh c26n24 "uptime"
Error: Invalid context specified: DSH
Error: Failed to dispatch command to any of the following service nodes: 
service01,service02,service03,service04



From the management node (trying to use the trace option to see errors):

[root@mgmt1 ~]# xdsh c26n24 -T "uptime"
Error: Permission denied for request

Logs show:
Dec  7 14:36:13 mgmt1 xCAT: xCAT: Allowing xdsh to c26n24 for root from 
localhost


Policy table for root:
"1","root",,"allow",,


Not seeing anything in the compute nodes messages or secure that would 
lead me to believe an attempt to login via SSH was ever made.


psh, on the other hand, works fine. the remoteshell script does run on 
each node at boot time.





On 12/7/2012 1:42 PM, Ling Gao wrote:
The error is given because the nodes are not in the same subnet as the 
service node, the script did not check for routing.
The code uses this function to set up MASTER environmental variable 
for the node when calling postscripts. If it is not set up correctly 
some of the postscripts such as syslog will fail. When you say "it 
works fine even with that error". Can you xdsh to the nodes that has 
the error?


Ling

Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: ling...@us.ibm.com, 845-433-5692

"I never worry about the future. It comes soon enough." --- Albert 
Einstein




From: Russell Jones 
To: xcat-user@lists.sourceforge.net
Date: 12/07/2012 02:06 PM
Subject: Re: [xcat-user] Cannot find master for the node $node




Thanks!

DNS is working properly, both forward and reverse... this seems to 
occur mostly when entire clusters are booted at the same time.


The nodes are not within the same subnet, all of these nodes are on 
different subnets from the service nodes (and routed of course).  The 
other strange thing is again, it works fine even with that error:



[root@service03-hc log]# grep c25n37 messages
Dec  7 09:05:02 service03 xCAT: xCAT: Allowing getpostscript from c25n37
Dec  7 09:05:03 service03 xCAT: Cannot find master for the node c25n37
Dec  7 09:05:14 service03 xCAT: xCAT: Allowing getcredentials from c25n37
Dec  7 09:05:24 service03 xCAT: xCAT: Allowing getcredentials from c25n37
Dec  7 09:05:32 service03 xCAT: xCAT: Allowing getcredentials from c25n37


Could this message manifest itself if a service node is very busy? IE, 
not really an error, just took too long to respond to the request? Or 
is this literally sending the error because the compute nodes are not 
on the same subnet as the service nodes?




On 12/7/2012 12:18 PM, Ling Gao wrote:
Hi,
  The error is from getFacingIP function. This function takes a node 
name as an input. Then it
find out the ip of the given node. Then it call "ifconfig" on the 
local host. Then it try to see if the node ip and local host is within 
the same subnet or not. (Please see the code below). The error usually 
happens when the name resolution on the local host cannot resolve the 
given node.  Hope it helps.



#---

=head3   getFacingIP
  Gets the ip address of the adapter of the localhost that is 
facing the

   the given node.
   Arguments:
  The name of the node that is facing the localhost.
   Returns:
  The ip address of the adapter that faces the node.

=cut

#---
sub getFacingIP
{
   my ($class, $node) = @_;
   my $ip;
   my $cmd;
   my @ipaddress;

   my $nodeip = inet_ntoa(inet_aton($node));
   unless ($nodeip =~ /\d+\.\d+\.\d+\.\d+/)
   {
   return 0;#Not supporting IPv6 here IPV6TODO
   }

   $cmd = "ifconfig" . " -a";
   $cmd = $cmd . "| grep \"inet \"";
   my @result = xCAT::Utils->runcmd($cmd, 0);
   if ($::RUNCMD_RC != 0)
   {
   xCAT::MsgUtils->message("S", "Error from $cmd\n");
   exit $::RUNCMD_RC;
   }

   # split node address
   my ($n1, $n2, $n3, $n4) = split('\.', $nodeip);

   foreach my $addr (@result)
   {
   my $ip;
   my $mask;
   if (xCAT::Utils->isLinux())
   {
   my ($inet, $addr1, $Bcast, $Mask) = split(" ", $addr);
   if ((!$addr1) || (!$Mask)) { next; }
   my @ips   = split(":", $addr1);
   my @masks = split(":", $Mask);
   $ip   = $ips[1];
   $mask = $masks[1];
   }
   else
   {#AIX
   my ($inet, 

Re: [xcat-user] Cannot find master for the node $node

2012-12-07 Thread Russell Jones
Also forgot to mention, we are *not* using the syslog postscript. Each 
node is responsible for their own logs.



On 12/7/2012 2:45 PM, Russell Jones wrote:
I am not having much luck finding an xdsh command to run that 
succeeds. I get "permission denied" on the mgmt node for any command I 
try to run as root. On the service nodes, I am trying to follow the 
xdsh man page to just run a simple uptime and am getting:


[root@service01 ~]# xdsh c26n24 "uptime"
Error: Invalid context specified: DSH
Error: Failed to dispatch command to any of the following service 
nodes: service01,service02,service03,service04



From the management node (trying to use the trace option to see errors):

[root@mgmt1 ~]# xdsh c26n24 -T "uptime"
Error: Permission denied for request

Logs show:
Dec  7 14:36:13 mgmt1 xCAT: xCAT: Allowing xdsh to c26n24 for root 
from localhost


Policy table for root:
"1","root",,"allow",,


Not seeing anything in the compute nodes messages or secure that would 
lead me to believe an attempt to login via SSH was ever made.


psh, on the other hand, works fine. the remoteshell script does run on 
each node at boot time.





On 12/7/2012 1:42 PM, Ling Gao wrote:
The error is given because the nodes are not in the same subnet as 
the service node, the script did not check for routing.
The code uses this function to set up MASTER environmental variable 
for the node when calling postscripts. If it is not set up correctly 
some of the postscripts such as syslog will fail. When you say "it 
works fine even with that error". Can you xdsh to the nodes that has 
the error?


Ling

Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: ling...@us.ibm.com, 845-433-5692

"I never worry about the future. It comes soon enough." --- Albert 
Einstein




From: Russell Jones 
To: xcat-user@lists.sourceforge.net
Date: 12/07/2012 02:06 PM
Subject: Re: [xcat-user] Cannot find master for the node $node




Thanks!

DNS is working properly, both forward and reverse... this seems to 
occur mostly when entire clusters are booted at the same time.


The nodes are not within the same subnet, all of these nodes are on 
different subnets from the service nodes (and routed of course).  The 
other strange thing is again, it works fine even with that error:



[root@service03-hc log]# grep c25n37 messages
Dec  7 09:05:02 service03 xCAT: xCAT: Allowing getpostscript from c25n37
Dec  7 09:05:03 service03 xCAT: Cannot find master for the node c25n37
Dec  7 09:05:14 service03 xCAT: xCAT: Allowing getcredentials from c25n37
Dec  7 09:05:24 service03 xCAT: xCAT: Allowing getcredentials from c25n37
Dec  7 09:05:32 service03 xCAT: xCAT: Allowing getcredentials from c25n37


Could this message manifest itself if a service node is very busy? 
IE, not really an error, just took too long to respond to the 
request? Or is this literally sending the error because the compute 
nodes are not on the same subnet as the service nodes?




On 12/7/2012 12:18 PM, Ling Gao wrote:
Hi,
  The error is from getFacingIP function. This function takes a node 
name as an input. Then it
find out the ip of the given node. Then it call "ifconfig" on the 
local host. Then it try to see if the node ip and local host is 
within the same subnet or not. (Please see the code below). The error 
usually happens when the name resolution on the local host cannot 
resolve the given node.  Hope it helps.



#---

=head3   getFacingIP
  Gets the ip address of the adapter of the localhost that is 
facing the

   the given node.
   Arguments:
  The name of the node that is facing the localhost.
   Returns:
  The ip address of the adapter that faces the node.

=cut

#---
sub getFacingIP
{
   my ($class, $node) = @_;
   my $ip;
   my $cmd;
   my @ipaddress;

   my $nodeip = inet_ntoa(inet_aton($node));
   unless ($nodeip =~ /\d+\.\d+\.\d+\.\d+/)
   {
   return 0;#Not supporting IPv6 here IPV6TODO
   }

   $cmd = "ifconfig" . " -a";
   $cmd = $cmd . "| grep \"inet \"";
   my @result = xCAT::Utils->runcmd($cmd, 0);
   if ($::RUNCMD_RC != 0)
   {
   xCAT::MsgUtils->message("S", "Error from $cmd\n");
   exit $::RUNCMD_RC;
   }

   # split node address
   my ($n1, $n2, $n3, $n4) = split('\.', $nodeip);

   foreach my $addr (@result)
   {
   my $ip;
   my $mask;
   if (xCAT::Utils->isLinux())
   {
   my ($inet, $addr1, $Bcast, $Mask) = split(" ", $addr);
   if ((!$addr1) || (!$Mask)) { next; }
   my @ips   = split(":", $addr1);
   my @

Re: [xcat-user] Cannot find master for the node $node

2012-12-10 Thread Russell Jones

  
  
> Do you have
  /opt/xcat/xdsh/Context directory with two files  DSH.pm and
  XCAT.pm?
  
  On both the management and service nodes,
/opt/xcat/xdsh is empty. No Context folder or files anywhere, totally empty
  directory.

> Also see if  DSH_CONTEXT is
  exported.  It should not be, but you should have gotten an error,
  if you did.   That is a strange error message "Error: Invalid context specified: DSH"
  
  That is not exported on the mgmt or
  service nodes
  
  
  > Is
  the Management Node and the service nodes still at xCAT 2.3?  It
  will be really had to support that level, it has been out of
  service for a long time. 

Yes. I understand it is a very old release, but we cannot upgrade at
this time. Like I mentioned, we are just trying to determine if this
is actually something we need to track down and resolve, or if it's
an error that's not really an error.


> xdsh does not log onto the
  nodes.  It just run the remote command.  YOu can run xdsh c26n24 -T "uptime"  and it will show the
  exact command it runs.  What OS are you running?


The results of me using "-T" was included in my previous email - no
helpful output at all. The mgmt, service, and compute nodes are all
CentOS 5.4 x64.

> See if you can run the following:

Here's the results, seems to have worked fine:
 13:24:07 up 46 days,  1:43, 22 users,  load average: 0.47, 0.41,
0.40
:DSH_TARGET_RC=0:



Thanks for your continued help!


On 12/8/2012 5:32 AM, Lissa Valletta
  wrote:


  Do you have
  /opt/xcat/xdsh/Context directory with two files  DSH.pm and
  XCAT.pm?
Also see if  DSH_CONTEXT is
  exported.  It should not be, but you should have gotten an
  error, if you did.   That is a strange error message "Error: Invalid context specified: DSH"

Is the Management Node and the
  service nodes still at xCAT 2.3?  It will be really had to
  support that level, it has been out of service for a long
  time. 


  Lissa K. Valletta
  8-3/B10
  Poughkeepsie, NY 12601
      (tie 293) 433-3102
  


Russell Jones
  ---12/07/2012 03:49:47 PM---Also forgot to mention, we are
  *not* using the syslog postscript. Each  node is responsible
  for thei

From: Russell Jones
  
To: xcat-user@lists.sourceforge.net
Date: 12/07/2012 03:49 PM
Subject: Re: [xcat-user] Cannot find master
  for the node $node
  
  
  
  
  Also forgot to mention, we are *not*
using the syslog postscript. Each node is responsible for their
    own logs.

  
  On 12/7/2012 2:45 PM, Russell Jones
wrote:
  
I am not having much luck finding an
  xdsh command to run that succeeds. I get "permission denied"
  on the mgmt node for any command I try to run as root. On the
  service nodes, I am trying to follow the xdsh man page to just
  run a simple uptime and am getting:
  
  [root@service01 ~]# xdsh c26n24 "uptime"
  Error: Invalid context specified: DSH
  Error: Failed to dispatch command to any of the following
  service nodes: service01,service02,service03,service04
  
  
  >From the management node (trying to use the trace option
  to see errors):
  
  [root@mgmt1 ~]# xdsh c26n24 -T "uptime"
  Error: Permission denied for request
  
  Logs show:
  Dec  7 14:36:13 mgmt1 xCAT: xCAT: Allowing xdsh to c26n24 for
  root from localhost
  
  Policy table for root:
  "1","root",,"allow",,
  
  
  Not seeing anything in the compute nodes messages or secure
  that would lead me to believe an attempt to login via SSH was
  ever made.
  
  psh, on the other hand, works fine. the remoteshell script
  does run on each node at boot time.
  
  
  

On 12/7/2012 1:42 PM, Ling Gao
  wrote:

  The error is given because
the nodes are not in the same subnet as the service node,
the script did not check for routing. 
The code uses this function to set up MASTER environmental
variable for the node when calling postscripts. If it is not
set up correctly some of the postscripts such as syslog will
fail. When you say &qu

Re: [xcat-user] xCAT db access from postscripts

2012-12-12 Thread Russell Jones

  
  
Is there an ETA on when xCAT 2.8 will go stable?


On 12/12/2012 6:41 AM, Lissa Valletta
  wrote:


  The postscripts are run from a
  file called mypostscript which has exported many database
  attributes including all of the site table.  In xCAT 2.8 we
  are adding a mypostscript.tmpl,  so that  you can add any
  database attributes you want to be exported for the
  postscripts during the install or when you run updatenode. 
You can see the design here
  along with other performance enhancements. 
https://sourceforge.net/apps/mediawiki/xcat/index.php?title=Updatenode_Performance_Enhancements

  Lissa K. Valletta
  8-3/B10
  Poughkeepsie, NY 12601
  (tie 293) 433-3102
  


"Pocina,
  Goran" ---12/11/2012 05:11:46 PM---See questions at end... For
  RH & CentOS builds, a profile's kickstart template goes
  through macro pr

From: "Pocina, Goran"
  
To: xCAT Users Mailing list
  
Date: 12/11/2012 05:11 PM
Subject: [xcat-user] xCAT db access from
  postscripts
  
  
  
  
  See questions at
end…
   
  For RH & CentOS
builds, a profile’s kickstart template goes through macro
processing
  that allows it to
access xCAT attribute values.
   
  For a long time it
bugged me that postscripts don’t go through the same macro
processing, until
  a colleague pointed
out that it should be possible for postscripts to use curl to
pull any 
  attributes they need
during the install.   Very nice xCAT feature!!   Here’s one
possible postscript that could 
  be called by other
postscripts to load a node’s attributes into the current shell
environment:
   
  /install/postscripts/getnodeattr:
   
  
ATTRF=/var/log/xcat/attr.$NODE
mkdir –p $(dirname
  $ATTRF)
if [ ! -f $ATTRF ]
  ; then
        curl -k https://$MASTER/xcatws/nodes/$NODE?userName=wsuser&password=wspd  > $ATTRF.xml ||
               
  exit 8
fi
# Convert
#  
addkcmdlinesshd
# to
#   export
  ATTR_addkcmdline="sshd"
#
sed -e
  '/^<[/]*table/d' \
        -e
  "s//export ATTR_/" \
        -e
  's;;=";' \
        -e
  's;;";' < ${ATTRF}.xml > $ATTRF
# load into the
  environment
.  $ATTRF || exit
  7
 
  
  To use this, one
would call:          “.  /xcatpost/getnodeattr”             from
within a postscript.  The wsuser user and policy must, of
course, be set up first.
   
  Question 1. Is there
any reason not to use curl and the REST API from postscripts
during an install?   Does this duplicate existing xCAT
functionality?
   
  Question 2.  It’s
difficult to protect  the “wsuser” password coded into
/install/postscripts/getnodeattr.   The file can’t be made
read-only root, for example, because httpd needs to be able to
read it.   Is there a way to limit “wsuser” to GET calls?
   
  Thanks,
   
  Goran--
  LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free
  Trial
  Remotely access PCs and mobile devices and provide instant
  support
  Improve your efficiency, and focus on delivering more
  value-add services
  Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d___
  xCAT-user mailing list
  xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

  
  
  
  --
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
  
  
  
  ___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



  

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services

  1   2   3   4   >