I also sent this email to xen-users list, I hope it's ok.

Hello all. I'm trying to figure out vastSky on xcp1.0 beta. As we all know,
it has been integrated to the xcp. That's just about all one can find about
the matter. Been trying to google a lot, but with no luck. I'll write here
what information I've gathered, what I tried and how far I managed to get
with this.

I include information about my hardware, in case it has something to do with
all this. I have one four node SuperMicro twin2 server (2026TT-HiBQRF) with
QDR InfiniBand (haven't bought a switch or managed to get drivers for dom0's
yet, so it's gigabit ethernet for now). Each node is identical, containing:
1x Intel Xeon E5620, 12GB ddr3, 3x 60GB OCZ Vertex2 ssd and 3x 500 GB
Seagate Momentus 7200.4 SATA 2.5". No raid cards, just the onboard ICH10.

Networking configuration:

node A: hostname: super0nodeA ip: 192.168.10.210
node B: hostname: super0nodeB ip: 192.168.10.211
node C: hostname: super0nodeC ip: 192.168.10.212
node D: hostname: super0nodeD ip: 192.168.10.213

I have bonded two interfaces on each node, have only one gigabit switch and
haven't done any multipath configurations.

My plan was to use super0nodeA as Storage Manager and super0nodeB,
super0nodeC, super0nodeD as storage servers but ended up installing storage
and head server on super0nodeA also.

http://sourceforge.net/apps/mediawiki/vastsky/index.php?title=Main_Page
this seem's to be good starting point. If one click's "Install manual" on
the left, you get to:
http://vastsky.svn.sourceforge.net/viewvc/vastsky/tags/v3.0/doc/vas_install.txt?revision=367&view=markup

Install manual isn't xcp specific, actually it only references to xcp couple
of times but it seem's pretty straight forward when it comes to config's.
Someone at ##xen-api clarified what thing's I need to install. I mean there
actually is /etc/vas.conf on stock xcp 1.0 beta but one still need's to
install the needed rpm's to get the functionality.

So, as (also) stated in the installation document one needs (taken from the
vas_install.txt):
<start copy paste>
vastsky-common.rpm  Common library and configuration
vastsky-hsvr.rpm         Head server agent
vastsky-ssvr.rpm         Storage server agent
vastsky-sm.rpm          Storage manager
vastsky-cli.rpm           Storage manager command-line clients
vastsky-doc.rpm         Documentations (including this file)

Basically,
- -common package is required by other packages.
- Head servers need -hsvr package.
- Storage servers need -ssvr package.
- The storage manager needs and -sm package.
- The host on which you want to run user commands needs -cli package.
<end copy paste>

Everything I did, I did on dom0 of each server, actually I had no domU's on
these servers when I did all this.

So, first I edited "/etc/vas.conf" that exist on all four nodes, inserted ip
for "super0nodeA". It says "Comma separated list of hosts on which storage
manager runs" but I remember reading somewhere, that there can only be one
instance of it. Maybe one can define multiple ip's on a single host. I
didn't find anything else to modify in "/etc/vas.conf".

<part of vas.conf>
[storage_manager]

# host_list:
# Comma separated list of hosts on which storage manager runs.
host_list: 192.168.10.210
</part of vas.conf>

Then I created "/var/lib/vas/register_device_list" on each node. Added
disk's, following the instructions on vas_install.txt. I configured one ssd
disk and one hdd on each node. Actually, first I added this to nodes B, C
and D, but later on, I added this to A also.

I didn't modify "/etc/multipath.conf" since vas_install.txt states "This
step is not necessary if you solely use our XCP SR driver". Also I didn't
modify "/etc/hosts", since I used IP address instead of host name in
"/etc/vas.conf" and haven't found any where else to insert host names or ip
addresses.

Then after multiple reboot's and plenty of googling, I went to #xen and
#xen-api to ask some help. I was told that I need to install the rpm's. It
was "ahaa" moment and explained nicely why I didn't have cli commands
availeable or "/etc/init.d"  script's for the vastSky servers. So I did "rpm
-i vastsky-hsvr.rpm" and "rpm -i vastsky-ssvr.rpm" on all nodes. I also did
"rpm -i vastsky-sm.rpm" and "rpm -i vastsky-cli.rpm" on "super0nodeA".
vastsky-common.rpm is already installed on "stock" xcp 1.0 beta and it is
vastSky 2.1, so all the rpm's I installed, were from 2.1, not 3.0 that
seem's to be the newest version availeable at:
http://sourceforge.net/projects/vastsky/files/vastsky/

Then I did "/etc/init.d/vas_sm init" and "/etc/init.d/vas_sm start" on
"super0nodeA". Seemed like I was on fire. Finally I had some processe's
running that I was pretty comfortable thinking had something to do with
vastSky. Finaly I had commands working like:

- hsvr_list   "list head servers"
- ssvr_list   "list storage servers"
- pdsk_list  "list physical disks"

Tho no resources present, even after I issued "/etc/init.d/vas_hsvr start"
and "/etc/init.d/vas_ssvr start" on nodes "super0nodeB", "super0nodeC"
and "super0nodeD".
I knew that these services started since "ps -aux | grep vas" told me so and
also because I started getting lines on "/var/log/vas_<host name>.log" (not
sure if that is correct but the log files can be found at "/var/log", there
is only one starting with "vas" there and it is similar to what i wrote).

This is when I started thinking if the problem migth be network related. So
I installed vastsky-hsvr.rpm and vastsky-ssvr.rpm to super0nodeA and started
them. I also modified my "/etc/hosts" and added:

192.168.10.210 super0nodeA super0nodeA-data1 super0nodeA-data2
192.168.10.211 super0nodeB super0nodeB-data1 super0nodeB-data2
192.168.10.212 super0nodeC super0nodeC-data1 super0nodeC-data2
192.168.10.213 super0nodeD super0nodeD-data1 super0nodeD-data2

I did this to all nodes.

This is when I finally had something come out of "storage manager". If I
did hsvr_list, ssvr_list or pdsk_list, they all printed one resource, and it
was the same that was on "super0nodeA", where the storage manager was also
running. So still no connections from other nodes, even if I rebooted all
nodes.

After re-re-re-re-checking all the config's I did "/etc/init.d/vas_hsvr
stop", "/etc/init.d/vas_ssvr stop" and "/etc/init.d/vas_ssvr start" on
"super0nodeA".
About 5s after I started vas_ssvr I observed my server shutting down. Tried
to start it, just to see it shut it self again just after the loading screen
with panda on it. Just a text saying something about stunnel and bunch of
numbers on top of the screen. Well I taught it was something I did, so I
re-installed xcp.

While I was reinstallin xcp to node A, I started to think that my problem
might be node A, so I installed vastsky-cli.rpm and vastsky-sm.rpm to
"super0nodeB",
modified (changed the "host_list: 192.168.10.210" to 192.168.10.211)
"/etc/vas.conf"
on node B, C and D. Again, I had connections from head and storage servers,
but only from local ones. Still no connections from nodes C or D.

I did  "/etc/init.d/vas_hsvr stop", "/etc/init.d/vas_ssvr stop" and
"/etc/init.d/vas_ssvr start" on node B and again, server started shutting it
self down. This time I had another ssh session where I had "tail -f
/var/log/vas_super0nodeB.log" so even if the server shutted it self down, I
was able to copy paste the content of the screen:

<start of log>
2010-12-19 15:47:59,435 ssvr_reporter DEBUG /opt/vas/bin/daemon_launcher -n
1 /opt/vas/bin/DiskPatroller /var/run/DiskPatroller.run
2010-12-19 15:47:59,443 storage_manager INFO DISPATCH registerStorageServer
called. ({'ip_data': ['192.168.10.211', '192.168.10.211'], 'ver': 3},)
2010-12-19 15:47:59,444 storage_manager INFO DISPATCH registerStorageServer
EXCEPTION <Fault 17: 'EEXIST'>
2010-12-19 15:47:59,445 ssvr_reporter ERROR shutdown
2010-12-19 15:47:59,500 ssvr_reporter DEBUG shutdown -g0 -h now
2010-12-19 15:47:59,501 ssvr_reporter ERROR Traceback (most recent call
last):
  File "ssvr_reporter.py", line 231, in main
  File "ssvr_reporter.py", line 100, in register_resources
  File "vas_subr.py", line 68, in send_request
  File "/usr/lib/python2.4/xmlrpclib.py", line 1096, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.4/xmlrpclib.py", line 1383, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.4/xmlrpclib.py", line 1147, in request
    return self._parse_response(h.getfile(), sock)
  File "/usr/lib/python2.4/xmlrpclib.py", line 1286, in _parse_response
    return u.close()
  File "/usr/lib/python2.4/xmlrpclib.py", line 744, in close
    raise Fault(**self._stack[0])
Fault: <Fault 17: 'EEXIST'>
2010-12-19 15:48:00,337 storage_manager DEBUG RW.__send_request
('192.168.10.211', '192.168.10.211') 8883 registerShredRequest {'dextid': 4,
'capacity': 465, 'pdskid': 3, 'ver': 3, 'offset': 0}
2010-12-19 15:48:00,338 storage_manager DEBUG RW.__send_request
('192.168.10.211', '192.168.10.211') 8883 registerShredRequest {'dextid': 2,
'capacity': 55, 'pdskid': 2, 'ver': 3, 'offset': 0}
2010-12-19 15:48:00,340 ssvr_agent INFO DISPATCH registerShredRequest
called. ({'dextid': 4, 'ver': 3, 'pdskid': 3, 'capacity': 465, 'offset':
0},)
2010-12-19 15:48:00,342 ssvr_agent INFO DISPATCH registerShredRequest
called. ({'dextid': 2, 'ver': 3, 'pdskid': 2, 'capacity': 55, 'offset': 0},)
2010-12-19 15:48:00,343 ssvr_agent INFO false [Status 256]
2010-12-19 15:48:00,343 ssvr_agent INFO retrying(1/16) ...
2010-12-19 15:48:00,345 ssvr_agent INFO false [Status 256]
2010-12-19 15:48:00,345 ssvr_agent INFO retrying(1/16) ...
<end of log>

Notice: "2010-12-19 15:47:59,500 ssvr_reporter DEBUG shutdown -g0 -h now"

I did "/etc/init.d/vas_hsvr stop", "/etc/init.d/vas_ssvr stop" and
"/etc/init.d/vas_ssvr start" on node C also and exactly the same happened.
Server shut it self down and cant be started. Same stunnel... error.

This is how far I got before I stopped trying. Hope this helps someone else.
I would also welcome input if some one has something to say.

-Henrik Andersson
_______________________________________________
xen-api mailing list
[email protected]
http://lists.xensource.com/mailman/listinfo/xen-api

Reply via email to